© ACM, (1980). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Computer Graphics 14, 3, August 1980, pp110-116.
Hierarchical representations of 3-dimensional objects are both time and space efficient. They typically consist of trees whose branches represent bounding volumes and whose terminal nodes represent primitive object elements (usually polygons). The hidden surface elimination algorithm for such a representation is twice as complex as it needs to be and suffers other inefficiencies due to its lack of generality. This paper describes a method whereby the object space is represented entirely by a hierarchical data structure, with no other form of representation. This homogeneity allows the visible surface rendering to be performed simply and efficiently.
The bounding volumes selected for this algorithm are parallelepipeds oriented to minimize their size. With this representation, any surface can be rendered since in the limit the bounding volumes make up a point representation of the object. The advantage is that the visibility calculations consist only of a search through the data structure to determine the correspondence between terminal level bounding volumes and the current pixel. At each stage of the search the position vector corresponding to the current pixel must undergo the inverse of the transformation that has previously been applied to each bounding volume. For ray tracing algorithms, this means that a simplified operation will produce the point of intersection of each ray with the bounding volumes.
This representation is reasonable for surfaces which are neither polygonal nor smoothly curved yet are highly complex. Memory requirements are minimized by expanding or fetching the lower levels of the hierarchy only when required. In addition, the data structure can be constructed as a graph instead of a tree so that lower levels of the hierarchy can share common sub-descriptions of objects. Because the viewing process has a single operation and primitive type, the software or hardware chosen to implement the search can be highly optimized for very fast execution. Also, the representation has enough generality to allow views at arbitrary magnifications. As a final advantage, the clipping problem dissapears since the hidden surface elimination algorithm is done completely by ray transformations.
With increases in display resolution and processing power, computer generated images have become more complex. The state of the art has progressed far beyond "stick figure" representations of scenes to the point where details of realism are sought. In addition, the objects to be displayed are often complex, making the display process both time and space intensive.
The most popular approach to complex image generation has been to approximate surfaces with collections of polygons. Generally, a good approximation requires an enormous number of polygons and a corresponding amount of display time. To help address the problem of complex object descriptions, alternative representations have been used. Complex surface representations such as quadric and cubic patches [8,9] are well suited to curved objects but are not general enough to model an arbitrary scene. Another alternative, procedural representation [10], is completely general and relatively compact but still requires a mechanism for generating and displaying non-procedural primitive surface elements. Three dimensional "point" representations [4,6] are also completely general, relatively easy to display, but incredibly wasteful of memory. Hierarchical representations which decompose the object space into repeatedly simpler subspaces [1,2,3], are a promising form which allow arbitrary scene descriptions in an easily usable data structure. This paper will present a new twist on hierarchical representations of scenes that has many computational advantages.
Clark [1] proposed the use of hierarchical geometric representations to speed both clipping and visibility calculations. Each level of the hierarchy consists of bounding volumes which enclose the lower levels. At the bottom of the tree, object representations are encoded in some conventional form such as polygons. He introduced the notions of "resolution clipping" and "graphical working set". According to these notions, levels of the data structure which describe detail at a greater resolution than can be resolved in the image are clipped from the current object description along with those portions of the scene which lie outside the viewing area.
In its "multi-stage combinatorial geometry model", MAGI [3] utilizes a tree structured object description. The branches of the hierarchy are arbitrarily oriented rectangular parallelepipeds that enclose subvolumes of the object (see Figure 1). It is again necessary to switch to an alternative form of representation at the bottom level in order to describe the object. The visible surface algorithm is a ray tracing technique in which the inverse of the coordinate transformation of each bounding box is applied to the ray at each node in the tree. The hierarchical structure reduces the number of candidate objects against which the ray must be tested for intersection, and the successive transformations insure that the intersection calculations will be simple ones. Appropriately, the objects being displayed are trees (e.g. deciduous or coniferous instead of binary). Because of the enormous complexity of these objects, image generation times are several hours.
![]() |
Reddy and Rubin [2] present three forms of hierarchical decomposition which, individually or in a combination of two, are sufficient to completely describe an object. At the initial levels of the hierarchy, rectangular parallelepipeds (like the MAGI system) partition the object space. At lower levels of the hierarchy, one of two other representations can be used. The first low-level representation divides this sub-volume of the object space into eight equal sized subspaces by placing a partition in the middle of each axis. These eight subspaces are either empty, full, or further subdivided in the same binary manner (see Figure 2). The second low-level representation also divides the object subspace with partitions perpendicular to the axes. In this scheme, however, there can be multiple partitions along each axis at arbitrary locations (see Figure 3). Although the point accessing algorithm is more expensive in the second representation, less space is needed to represent an object due to the flexibility of the partitioning.
![]() | ![]() |
The representation proposed in this paper constructs the object space completely out of hierarchically structured subspaces. The terminal nodes do not contain another type of primitive, but are themselves displayable. Some of the advantages of this uniformity are an ability to examine the object at arbitrary magnification and the flexibility of combining common subspaces that share micro-descriptions.
The subspaces that we have been using are rectangular parallelepipeds. This simple unit can be described and traversed with a single transformation matrix, thus lending the process well to easy and efficient hardware implementation. The next section discusses this representation in detail.
Conventional visibility calculations suffer a combinatorial explosion when confronted with complex object descriptions [7]. Typical algorithms for hidden surface elimination require that each surface in the scene be sorted into a computationally effective order and then compared with a neighborhood of other surfaces. As the scene becomes complex, the neighborhood size becomes combinatorially unmanageable. A ray tracing algorithm, on the other hand, executes in a time that increases linearly with the number of objects in the scene. This method determines visibility by extending lines (rays) from the image plane into the object space. Whenever a ray intersects one or more objects, the nearest point of intersection is the visible one. In addition, it needs no explicit clipping operation and is the only algorithm suited to the global illumination models of Kay [6] and Whitted [5].
A structured object space makes hidden surface elimination more efficient. By dividing the object into a small number of subspaces (perhaps ten) the ray traced from the screen can easily be checked for object intersection at a macro level by comparing it with the simple subspaces. Each of these subspaces is an arbitrarily rotated, translated, and scaled rectangular parallelepiped (see Figure 4a). It is described with a four-by-four transformation matrix which transforms the ray in the object space into a ray within this subspace. All that is required to determine intersection is a vector transformation and a comparison against the limits of the subspace boundary. If the ray misses all of the parallelepipeds at the top level, then the corresponding screen pixel is "empty" (as is often the case in simple scenes). If it sucessfully penetrates a subspace, then the search procedes at the lower level where all of the sub-subspaces of that subspace are examined in the same manner (see Figure 4b). When the ray enters a parallelepiped that is not further subdivided, then it has reached a solid surface whose characteristics can be displayed.
![]() | ![]() |
Coupled with the standard benefits of ray tracing are a new set of features that this hierarchy provides. Logarithmic access time (instead of polynomial time) stands out as the best feature. It typically takes only five or six levels of subdivision to represent complex scenes because each level contains about order of magnitude more detail. Thus, each linear increase in access time caused by an additional level of subdivision yields an exponential increase in resolution at that level. Object spaces are often shallow trees with a high branching factor.
Even with a shallow tree such as this it is possible to conserve storage at lower levels when there are common object space features. Any subspaces in the entire tree that have the same detail, regardless of the environment, can share their descriptions since all environmental information (location, orientation, and scale) is contained in the higher levels of the hierarchy. Most objects can take advantage of this feature because they have some common properties (usually surface details such as tree bark, windows in buildings, etc.) Thus, the hierarchy of subspaces is actually constructed as a graph instead of a tree. No efficiency is lost and much space is saved.
Another advantage of this scheme is its total generality. Any object can be represented since, in the limit, the parallelepipeds can form a point representation. In the following sections we will show how planar polygons can be represented in terms of their bounding volumes, and how bi-parametric curved surfaces can be reduced to a point representation. Thus rectangular parallelepipeds are very simple objects with which to work. Since they provide uniformity of the object description they allow arbitrary magnification of the view to be done with no adjustment to the description or viewing algorithm.
Creation of hierarchical databases is a non-trivial operation. The problem is that the partitioning at each level requires an understanding of the entire object space. When working with dynamically moving objects, an understanding of the motion is also needed to partition the object correctly. Even static objects must be intelligently divided to make the viewing algorithm effective. This section will discuss a semi-automatic scheme for hierarchical decomposition and will propose some fully-automatic possibilities.
All data for an object starts as a point or surface representation. This information comes from digitizers or algorithms and is rarely aggregated in any hierarchical form. Therefore, the first form of data is a two-level hierarchy where the top level represents the entire object space and the lower level is the complete scene description with thousands of parallelepipeds corresponding to the data points. In order to build a proper hierarchy, we have a structure editor which is able to introduce intermediate level spaces in the object. This editor allows random traversal of the object hierarchy and arbitrary creation, deletion, and transformation of the object components. In addition, the editor can overlay a non-hierarchical description of an object (the typical initial form) with the hierarchy that is being edited to visually assist the human operator. Early experience shows that the editor is easy to use: a hierarchical description of the city of Pittsburgh, with hundreds of terminal nodes in the hierarchy, took only a day or two to enter. The human who assists the editor in the hierarchy creation is looking for object coherence, tightness of fit within the domain of rectangular parallelepipeds, and dynamic motion consistency. In addition, the human must decide when to instruct the editor to combine similar low-level spaces that are to share descriptions.
Automating of the hierarchy creation process, although not currenly implemented, could be done by a number of methods. The simplest technique would look for clusters within the object space. It would view the object space as a three-dimensional histogram and select peaks which represent object coherence. There are a number of ways to extract multi-dimensional histogram peaks which could be used [11]. One possibility is to reduce the resolution of the object space and find clusters in that. Lower resolution spaces are easier to deal with and accurately represent the original data [12,13].
For dynamic objects, an initial intermediate level of hierarchy could be built around the known degrees of freedom. The initial object space would then contain a top level which is "the world", a middle level for the moving units, and a low level for the scene detail. The automatic hierarchy routines would then work from there in the same manner.
Automatic combining of common subspaces would be a pre-processing step that finds common detail in the initial data and merges the descriptions. This sort of pattern matching is a difficult problem which can suffer a combinatorial explosion. Reasonable solutions must use some pruning of the alternatives to arrive at an answer.
It can be seen that the creation of a correct hierarchy requires careful consideration. Sloppy hierarchies will cost dearly in the scene rendering stage but good hierarchies are hard to create automatically. In fact, it is possible to understand the benefits of rendering hierarchical scenes in terms of the expense of creating them. Simpler object representations need less work to create an object space but need more work to render a scene. Thus, this scheme derives many of its benefits from being able to off-load all of the combinatorial problems to the model creation stage. For many problems of computer graphics this is a desirable tradeoff because the display must be as fast as possible.
Display consists of two operations: visibility determination and shading. The shader used with this algorithm is described elsewhere [5]. For ray tracing algorithms, visibility determination is essentially a process of intersecting each ray with a set of objects and chosing the nearest point of intersection (Figure 5).
![]() |
The structured database improves the efficiency of the operation by minimizing the object set to be tested against each ray. Equally important to the performance of the algorithm is the speed of the intersection mechanism. It is for this reason that rectangular parallelpipeds (RP's) are chosen as the only data elements in the object description. Each RP is centered in its own coordinate system with faces defined by
as illustrated in Figure 6. A ray is defined parametrically by
![]() |
After a ray is transformed into the coordinate system of the RP, to test a ray against the faces y = ±yb one solves the equations
The smaller of the two t values defines the nearer point of intesection. Then if
and
the face is pierced by the ray. The tests to determine if the ray pierces the x = ±xb and z = ±zb faces are identical.
As the ray progresses deeper into the hierarchy, the dimensions of the RP's become progressively smaller until at the terminal level each box is essentially a point in three space. Associated with each terminal level box is a normal vector corresponding to the surface normal of the object which generated the box. The surface normal is unrelated to the orientation of the box and is computed by the data generation procedure instead of the display procedure.
It would be wasteful to represent planar polygons by collections of infinitessimally small bounding boxes. For the special case of rectangles, the terminal level bounding box can represent the polygon exactly if one of its three dimensions is equal to zero and the other two coincide with the dimensions of the rectangle. By imposing an additional set of bounds in the two non-zero dimensions of the face intersection test, arbitrary planar polygons can be represented. Figure 7 shows a triangle represented by a bounding box and two additional bounds.
![]() |
Simplicity of the display procedure, which is the key to the speed of this algorithm, is gained by transferring much of the processing load to the data generation stage. In some cases the data generation can be performed off line either automatically or through the use of the structure editor. In many cases, however, data generation and display must occur concurrently. One example is the automatic expansion of curved surfaces into a hierachical point representation. Subdivision algorithms for bi-parametric surfaces [14,15] are natural candidates for the task. We have augmented straightforward subdivison with routines for generating bounding boxes for each subpatch and for maintaining the hierarchy at each step. Unnecessary processing is minimized by subdividing a patch or subpatch only if its bounding box is pierced. When the hierarchy is extended via subdivision, the new branches are retained from one pixel to the next to gain the benefits of object space coherence. In a similar fashion procedurally defined objects can be expanded into a point representation, although we have not implemented such an expansion procedure.
Naturally, unlimited expansion of the object description may fill the memory available to the display process. When this occurs a "least recently used" garbage collector is activated.
The choice of representation presented here and the attempts to simplify the display process are motivated by our desire to produce a technique that can be easily realized in hardware. However, two initial versions of the display algorithm have been implemented in software.
The original program, running on a PDP-11/40, incorporates the structure editor as an adjunct to the display routines. Using the editor, a detailed polygonal description of the city of Pittsburgh containing over 38,000 terminal nodes was created. By using instances of such items as windows on buildings, the number of terminal nodes actually stored is less than 600. This database was used to generate the images in Figures 8 and 9 on a VAX-11/780. Figure 8 required less than an hour processor time to generate, but Figure 9 took nearly six hours due to increased detail.
![]() | ![]() |
A second version of the display algorithm has been incorporated into a set of VAX programs previously described [5]. Although the programs utilize floating point arithmetic and execute very slowly, they allow us to use a very good shader. In addition, the VAX's one megabyte memory makes it possible to experiment with automatic expansion of a hierarchical object description via patch subdivision. Figures 10 and 11 were produced with this program. The bi-cubic patch shown in Figure 10 would require more than 300 million sample points to represent the surface with the same amount of resolution produced here. Figure 10 required seven hours of computing time to display, and Figure 11, which contains 36 bicubic patches and a collection of assorted polygons, required three hours.
![]() | ![]() |
For ray tracing, we have found that the object space coherence feature of the hierarchical representation yields an improvement for bi-cubic surface display of about one hundred to one. As expected, the hierarchical structure produces an execution time proportional to the logarithm of the number of terminal nodes in each scene. For the case of bicubic patches, about 40 percent of the time is used searching the object description tree and another 40 percent is spent subdividing patches to update the object description. Since the display process is oriented to a hardware implementation, we have made no effort to optimize the software. We are just beginning to study ways of extending this representation to display methods other than ray tracing.
Hierarchical object descriptions provide a substantial performance advantage for the display of complex scenes because the initial stages of the display process consist of a logarithmic search through the object description. We have described a method by which the entire visibility calculation is reduced to this logarithmic search, both simplifying and improving the performance of the display processor. This simplicity is gained through the use of a single form of representation that is sufficiently general to accommodate a wide variety of object types.