EFFICIENT OCCLUSION CULLING FOR LARGE MODEL VISUALIZATION Dirk Bartz WSI/GRIS University of Tübingen [email protected] Michael Meißner WSI/GRIS University of Tübingen [email protected] Gordon Müller Institute of Computer Graphics Braunschweig University of Technology [email protected] Abstract Occlusion and visibility culling is one of the major techniques to reduce the geometric complexity of large polygonal models. Since the introduction of hardware assisted-occlusion culling in OpenGL (as an extension), solely software-based approaches are losing relevance rapidly for applications which cannot exploit specific knowledge of the scene geometry. However, there are still several issues open on the software side for significant performance improvements. In this paper, we discuss several of these techniques. Keywords: Large model visualization, occlusion culling, visibility, hierarchical scene organization, efficient bounding primitives 1. Introduction In the past few years, the size of polygonal datasets used in scientific visualization has increased rapidly. Typical model sizes range from several hundred thousands of polygons in architectural design applications, several millions of 1 2 polygons in medical applications, multi-million polygons in mechanical engineering, to several hundred of million polygons in scientific computing. In the mean time, the raw polygonal rendering performance of state-of-the-art computer graphics subsystems is also increasing fast. However, it is not increasing as fast as the extra-ordinary high demands driven by scientific visualization, resulting in a growing divide between rendering requirements and raw rendering performance. Numerous approaches address this growing divide, such as mesh reduction, geometry compression, parallel processing, volume rendering, and visibility and occlusion culling. Several visibility and occlusion culling approaches were developed in the past decade to reduce the polygonal complexity of a wide variety of applications. Object-space oriented visibility culling algorithms address mostly problems from global illumination or computational geometry. However, they are usually either limited to specific visibility problems, or do not provide enough performance for rapid, or even interactive (more than 10 frames per second (fps)) explorations of large geometric objects. Here, we are focusing on image-space occlusion culling algorithms which show good performance for general polygonal models or dynamic scenes, while object-space approaches can only deal with a limited scene complexity. With the introduction of graphics hardware support for occlusion culling by Hewlett-Packard Scott et al., 1998, interactive exploration (more than 10 fps) became feasible for large polygonal models, virtually eliminating the need for purely or mostly software-based approaches for applications with general or dynamic scenes. Therefore, we will in particular discuss methods which exploit this specific functionality. 2. Occlusion Culling Approaches Most occlusion or visibility culling algorithm rely on a hierarchical organization of the scene to check for visibility at different levels of detail. Another common feature is an initial view-frustum check – usually in object space – to determine the scene elements which are not intersecting with the view-frustum and hence, not visible. Garlick Garlick et al., 1990 presented an approach which combines these techniques for a walk-through system. Most other image space approaches apply an additional occlusion test, to determine which parts of the geometry are occluded by scene elements rendered previously. Most of these approaches Greene et al., 1993; Zhang et al., 1997; Bartz et al., 1999 use different software-based techniques to trace visual contributions of the scene elements to determine if they are potentially visible . Greene et al. Greene et al., 1993 traced the rasterized silhouette through a z-pyramid to determine changes in the z-buffer, indicating a not occluded object. Zhang et al. Zhang et al., 1997 used a hierarchical screen projected map of pre-selected occluders Efficient Occlusion Culling for Large Model Visualization 3 to check if the scene elements are occluded. In Bartz et al., 1999, the OpenGL stencil buffer is used as virtual occlusion buffer, which is sampled to trace for visual contribution. We will explain this in more detail later in Section 2.3. In contrast to these software-based approaches, Hewlett-Packard released the VISUALIZE fx series of graphics subsystems which provide hardware support for occlusion culling queries using the HP occlusion culling flag Scott et al., 1998. The visibility status of a scene element can be queried by rendering this element (or its bounding volume) in an occlusion mode which does not contribute to the framebuffer. However, potential changes to the depth buffer are detected, indicating visibility of that element. Other hardware-based approaches include the Z-query in the not anymore available Denali graphics on the Kubota Pacific workstation Greene et al., 1993 and the instrument extension on SGI’s Visual PC SGI, 1999. In another recent proposal, Bartz et al. Bartz et al., 1998 suggest hardware modifications within the rasterizer by adding an Occlusion Culling Unit to provide detailed visibility information, or culling individual triangles or pixel groups beyond “traditional” HP-flag style culling Meißner et al., 2001. In the following, we briefly discuss efficient scene tree traversal which determines in which sequence the hierarchical scene organization is tested for occlusion. Afterwards, we present results obtained using OpenGL for efficient occlusion culling of complex polygonal scenes Bartz et al., 1999. In particular, we present two methods how to use the OpenGL selection mode to perform view-frustum culling and OpenGL’s stencil buffer as virtual occlusion buffer. 2.1 Efficient Scene Tree Traversal The efficiency of image-space occlusion culling approaches depends on the sequence by which occlusion is determined, since scene objects in the back of the scene from the current view point are very unlikely to occlude other objects. Greene et al. Greene et al., 1993 used a front-to-back sorted scene to ensure efficiency. Zhang et al. Zhang et al., 1997 select “good occluders” from a preselected occluder database based on similar heuristics. In our previous stencil buffer-based approach Bartz et al., 1999 (also presented later on in Section 2.3), we perform front-to-back sorting during view-frustum culling, which involves the transformation of the scene objects (or their bounding volumes) into the view coordinate system. Furthermore, we also proposed an interleaved scheme where the occlusion test is performed also on the inner nodes of the scene hierarchy tree, once view-frustum culling has determined those as potentially visible – and sorted these nodes front-to-back. This scheme allows the assignment of an occlusion budget to balance rendering and culling costs. In early HP occlusion culling flag-based approaches, a brute-force testing on scene elements was used Scott et al., 1998, which we modified to view- 4 frustum culling of the complete scene hierarchy tree, sorting of the remaining leaf elements, and subsequent front-to-back occlusion testing of the potentially visible leaf elements Bartz and Skalej, 1999. Further measurements on the visibility of leaf elements indicated additional potential for more efficient utilization of occlusion culling for scenes with a high depth complexity. The approximately 10% front-most leaf elements in these scenes are not occluded in most of the cases, hence they can be rendered without the costly determination of their visibility. Furthermore, the 40% farthest scene elements do not significantly change the visibility information stored in the depth buffer of the graphics system because they are already occluded. Therefore, their occlusion status can be established without taking their own occluder potential into account, which usually is a cheaper occlusion query. However, the distribution of the objects in a scene is very application dependent and needs to be established individually. 2.2 View-Frustum Culling In contrast to other published approaches, we use OpenGL to perform the view-frustum culling step. In detail, we use the OpenGL selection mode to detect whether a bounding volume interferes with the view-frustum. This OpenGL mode is designed to identify geometric objects rendered into a specific screen area Woo et al., 1997, which in our case is the whole screen. The polygonal representation of the bounding volume (as convex hull) is transformed, clipped against the view-frustum, and finally rendered without contributing to the actual framebuffer Woo et al., 1997. Once a bounding volume intersects the viewfrustum – the hit buffer of OpenGL’s selection mode has a contribution from this bounding volume object – we test whether the polygonal bounding volume resides entirely within the view-frustum. In this case, all subtrees of the bounding volume are marked potentially visible. Otherwise, we recursively continue testing the child nodes of the bounding volume hierarchy. In rare cases, the bounding volumes can completely contain the view-frustum – resulting in no contributions to the hit buffer of the selection mode, due to a not visible bounding volume representation. This can be prevented by testing if the view point lies within the bounding volume, or if the bounding volume lies in between the near plane of the view-frustum and the view point. As a result of the view-frustum culling step, leaves are tagged potentially visible, if they are not culled by the view-frustum culling, or definitely not visible. 2.3 Virtual Occlusion Buffer-based Culling The task of an occlusion culling algorithm is to determine occlusion of objects in a model. We use a virtual occlusion buffer (VOB), being mapped onto 5 Efficient Occlusion Culling for Large Model Visualization the OpenGL framebuffer to detect possible contribution of any object to the framebuffer. In our implementation of the algorithm on an SGI O2 and an SGI Octane/MXE, we used the stencil buffer for this purpose . Intentionally, the stencil buffer is used for advanced rendering techniques, like multi-pass rendering. To test for occlusion of a node, we send the triangles of its bounding box (BB) to the OpenGL pipeline, use the z-buffer test while scan-converting the triangles, and redirect the output into the virtual occlusion (stencil) buffer. Occluded bounding volumes will not contribute to the z-buffer and hence, will not generate a footprint in the virtual occlusion buffer. Although reading the virtual occlusion buffer is fairly fast, it is the most costly single operation of our algorithm, which accounts for approximately 90% of the total costs of the occlusion culling stage. This is mainly due to the time consumed for the setup getting the buffer out of the OpenGL pipeline. For models subdivided into thousands of bounding volumes, this can lead to a less efficient operation. Furthermore, large BBs require many read operations. Therefore, we implemented a progressive occlusion test, which reads spans of pixels from the virtual occlusion buffer using a double interleaving scheme. Although, the setup time on the O2 for sampling small spans of the virtual occlusion buffer increases the time per sample, spans of ten pixels achieved an almost similar speed-up as sampling entire lines of the virtual occlusion buffer. For the Octane/MXE, we need to read larger chunks from the framebuffer in order to achieve a sufficient speed-up. Note that sampling introduces a nonconservative heuristic which trades-off quality versus rendering performance. However, the experiments in the next section showed no apparent artifacts (even in animations), although small image differences are present. 2.4 Analysis We ran several experiments to evaluate the performance of the OpenGLassisted occlusion culling approach. For a complete listing of the results, please refer to Bartz et al., 1999. All experiments were run on four different scenes; an architectural scene of eight gothic cathedrals – arranged on a 3D array, a city scene, a forest scene to demonstrate quantitative culling, and – similar to Zhang et al., 1997 – the content of a virtual garbage can of rather small objects (see also Fig. 1). An hierarchical scene organization for each dataset was generated by manual tuning of a scene hierarchy generated by SGI’s OpenGL optimizer (see Section 3). Frame-rate and culling rate are measured over a sequence of about 100 frames on an SGI O2 workstation (256 MB, R10000 @ 175 MHz), and on an SGI Octane/MXE (896 MB, R10000 @ 250 MHz CPU). Please note that the datasets used for the SGI O2 were using triangle strips, while this was not possible for 6 (a) (b) (c) (d) (e) (f) Figure 1. Scenes: (a,d) city b,e) garbage, (c,f) cathedrals; upper row: views with occlusion culling; lower row: views with BBs of occluded scene elements. technical reasons on the Octane. scene #triangles cathedrals city forest garbage 3,334,104 1,056,280 452,981 5,331,146 culling O2 91.3% 99.8% 84.7% 96.0% The polygonal complexity and speedculling MXE 92.5% 87.7% 80.5% 38.2% frame-rate MXE/[fps] 5.3 7.7 4.7 0.3 speed-up O2/MXE 4.2 / 12.6 4.8 / 9.8 2.6 / 6.1 7.0 / 5.0 Table 1. Average performance of OpenGL-assisted occlusion culling compared to viewfrustum culling only. Different culling rates are due to different sampling parameters. up numbers are listed in Table 1. The costs for view-frustum and occlusion culling vary with the different scene organization granularities; i.e., for the cathedrals dataset, view-frustum culling accounts for approximately 5% of the costs, occlusion culling (mostly dominated by reading the stencil buffer) for approximately 20%, and rendering of the as visible classified scene elements for about 75% of the total frame time. The distribution and total amount of the costs basically depend on architecture of the graphical subsystem. A highly interleaved graphics system like the InfiniteReality of SGI is likely to perform Efficient Occlusion Culling for Large Model Visualization 7 worse than the O2 for this algorithm due to the high set-up costs of reading from the framebuffer, while single-pipeline low- and mid-end graphics are mostly limited by the total amount read from the framebuffer. Therefore, the sampling parameters – sampling frequency and how much is read from the framebuffer per sampling – of the occlusion culling approach trade off visual quality and minimized set-up time and need to be parametrized for each graphics system. 3. Hierarchical Scene Organization In the previous section, we pointed out the need for a hierarchical scene organization, which is difficult to derive for general polygonal models. Several previous papers on visibility and occlusion culling touch the topic of scene organization. While some approaches require the scene designer to provide the scene organization Snyder and Lengyel, 1998; Zhang et al., 1997, others employ decomposition methods which are application specific, such as a decomposition along the skeleton of a volumetric object Hong et al., 1997, or a floor plan of a building Airey et al., 1990; Teller and Sequin, 1991. However, these schemes cannot be applied efficiently to general models. Models built in ComputerAided-Design (CAD) systems already include appropriate scene organization information in a product data management system, due to the design process which uses hierarchical notions like grouping and replication. A more general approach is to organize a polygonal model into regular spatial decomposition schemes, such as BSP-trees Fuchs et al., 1980 or Octrees Greene et al., 1993. While these decomposition schemes produce good results on polygonal models extracted by the Marching Cubes algorithm from uniform grid volume datasets — which provide a “natural” decomposition on a Marching Cubes cell base —, these schemes run into numerous problems on general models. If a polygon of the model lies across a decomposition boundary, it must be either split into several parts, in order to produce a disjunct representation of the bounding entities, or handled in another special way. Splitting polygons however, can increase the number of small and narrow polygons tremendously. Significant work on model organization has been published in the field of collision detection. Methods based on oriented bounding boxes (OBB) were explored by Gottschalk et al. Gottschalk et al., 1996. A bottom-up approach for the construction of a model hierarchy is suggested in Barequet et al., 1996 in which nodes representing small parts of the geometry are “merged” into higher hierarchy nodes. In Bartz et al., 1999, the spatialization functionality of SGI’s OpenGL Optimizer package SGI, 1997 was used to generates scene hierarchies automatically. However, our experience from the previous section showed that these scene hierarchies need to be tuned manually in order to get sufficient performance and motivated the work described in this section. More detailed information on the 8 analysis and results using different decomposition techniques can be found in Meißner et al., 1999. 3.1 Scene Organization and Occlusion Culling Generally, a polygonal scene can be decomposed into smaller parts, where this scene organization can be either hierarchical or non-hierarchical. We call each part of this decomposition a scene entity. If information at different multiresolution levels are required, usually a hierarchical organization is chosen, where different scene entities are combined into one parent entity which contains the whole information of the associated scene entities, or only information with less detail (a lower level-of-detail). This decomposition can be represented as a tree which is referred to as scene tree or scene graph. This tree contains two different kinds of nodes; scene nodes and leaf nodes. Only the leaf nodes contains the geometry of the actual model and the BB with respect to the used decomposition method. In contrast, a scene node does not contain any geometry of the actual dataset; it only contains the spatial boundaries of the associated geometry nodes, thus the scene node is the implementation of the abstract scene entity. In the following, three different approaches to generate a scene hierarchy starting from given models are presented. To measure the decomposition quality, we need to reduce the degrees of freedom of the used occlusion culling approach. Therefore, we employ OpenGLassisted view-frustum culling Bartz et al., 1999 and the Hewlett-Packard occlusion culling flag Scott et al., 1998. First, view-frustum culling is performed hierarchically on the scene tree top-down; the scene nodes of the tree are tested for intersection with the view-frustum. All the nodes which have a non-empty intersection are located within the view-frustum and may contain visible substructures. Consequently, we proceed with the view-frustum test, until we reach the leaf levels of the scene tree. In the second step, all the remaining leaf nodes are depth-sorted according to their associated BBs and tested for occlusion using the HP occlusion culling flag. The actual geometry of the front-most leaf node is rendered into the empty framebuffer without testing. All further rendering is performed in an interlocked fashion. First, we render the BB of the next-closest leaf node in the HP occlusion mode which does not contribute to the framebuffer. If this BB would have a visible contribution to the framebuffer, the HP occlusion culling flag is set TRUE by the graphics hardware, and we render the associated geometry in the standard render mode. If the BB does not have any contribution (flag is set FALSE), this BB and all the associated geometry is not visible and therefore it is culled. Efficient Occlusion Culling for Large Model Visualization 9 Figure 2. Algorithm p-HBVO: First, polygons are sorted along every coordinate axis (a), then polygons are partitioned into two sub-sets of neighboring polygons. Every decomposition is evaluated by a cost function estimating the rendering costs (b). This evaluation is repeated for every coordinate axis. The partition with minimal costs defines the decomposition of the scene (c). Finally, this process is continued recursively. 3.2 Polygon-based Hierarchical Bounding Volume Optimization (p-HBVO) Our polygon-oriented Hierarchical Bounding Volume Optimization (p-HBVO) method decomposes recursively a set of polygons into two smaller sets of polygons. The selection of the optimal scene decomposition is computed by minimizing a cost function approximating the expected rendering costs for all potential decompositions. At each decomposition level, the individual polygons are assigned to exactly one scene entity of that level. Consequently, no polygons are split, hence no new polygons are generated by this method. First, the polygons are sorted along all coordinate axes where the barycenter of each polygon serves as sorting key. Based on these three ordered lists, we evaluate the potential binary decomposition sets along each axis for each entry in the respective list by splitting the sorted list of polygons into a left and right part. While most other decomposition schemes – ie., the median cut scheme Kay and Kajiya, 1986 – use a pre-defined decomposition position, we evaluate for each possible decomposition set a cost function which approximates the costs of rendering the tree node which holds the decomposition sets as direct child nodes, including the costs of visibility tests using their respective bounding volumes (Fig. 2). The decomposition process is repeated recursively on the two new scene entities, one containing all left polygons, the other one contains all right polygons. The recursions terminates when either the number of polygons, or the scene depth exceeds one of the two pre-defined parameters of the maximum number of triangles per entity, or the maximum tree depth. These parameters are specified by the user and supplied at the start of the decomposition process. 10 This cost function is identical to one which has already been successfully applied in ray tracing environments Müller and Fellner, 1999, since the objective is the same; both algorithms traverse the scene graph in a similar way to determine visibility. The costs of a scene entity , with children and , is given by: ! #"%$ where & " " ')( * (+ & ,-" " '.( ( & & ( ( is the number of polygons within hierarchy , & " the surface area of the BB associated to sub-scene /0!132547698:6;=< . , and By combining two (three) binary tree levels to a quad-tree (octree), we can reduce the tree depth which in turn decreases the number of inner nodes occlusion tests. Overall, this algorithm generates well balanced scene trees with respect to their polygon load. Furthermore, polygons of individual objects are detected and clustered together. Optimal culling performance was achieved with finer decompositions (implemented by merging neighboring tree levels), which usually leads to higher culling costs, and hence lower rendering rates. 3.3 Octree-based Regular Space Decomposition (ORSD) While the previous approach is able to handle arbitrary sets of polygons, some data sources inherently generate regular decompositions which can be exploited at much lower cost. Uniform gridded datasets of MarchingCubes isosurfaces from MRI or CT scanners (i.e., ventricular system dataset), consists of a set of sample values (voxels), arranged on an uniform grid. A cube of eight neighboring voxels is called a cell, where cells with a non-empty intersection with the selected isosurface are called relevant cells. The Octree-based Regular Space Decomposition method (ORSD) uses a cellbased (voxel-based) evaluation criterion, where the number of relevant cells (relevant cell load or RCL) controls the decomposition process. This criterion is only a rough approximation of the actual number of extracted polygons, considering that each relevant cell represents between one and five triangles of the isosurface. In our experiences however, RCL turned out to be sufficiently accurate (Figure 3c visualizes one of the generated scene trees, where the drawn BBs are bounding the actual geometry, not the respective octant volume). After the construction of the entire octree, the RCL of each octant is already calculated. Subsequently, the octree is traversed recursively, starting with the superblock. If the RCL is above a user-specified threshold, the block is classified Efficient Occlusion Culling for Large Model Visualization 11 as a scene node, thus being further decomposited into its child blocks. In the other case, the block is considered as a leaf node. The associated relevant cells — at the bottom level of the octree — contain the polygons of the isosurface, which are assigned to that leaf node. This decomposition process results in a hierarchy of blocks, where the leaf nodes contain the actual geometry (see Section 3.1). Overall, ORSD is a simple but efficient decomposition scheme which generates an adequate polygon load balance and BB sizes. As shown in the results, the indirect evaluation method (RCL instead of number of polygons) does not adversely affect the occlusion culling performance. However, ORSD is limited to uniform grid datasets. 3.4 SGI’s OpenGL Optimizer (OPT) SGI’s OpenGL Optimizer (OPT) is a C++ toolkit for CAD applications that provides scene graph functionality for handling and visualization of large polygonal scenes. The decomposition method realized in OPT is similar to the construction of an octree; each scene entity is split into eight equally sized scene entities. This process is repeated recursively, until a certain threshold criterion for the iterated decomposition is reached. Octree-based spatial decomposition is a simple and efficient scheme. However, the OPT scene organization mechanism decomposes space not by simply bisecting edges of a cube, as in an octree, but by choosing decomposition planes so that the rendering loads of the resulting parts are similar. As a result, the amount of geometry in each scene entity on each side of the cutting plane is approximately the same. Polygons which are split due to the decomposition are distributed to the respective scene entities. The main parameters that can be used to control the decomposition are hints for the lowest and highest amount of triangles in each scene entity at the leaf-level of the scene hierarchy. However, the decomposition algorithm only tries to meet these criterion but is not bound to it. In general, OPT generates scene hierarchies with a well-balanced polygon load. However, the BBs of the scene entities are less suited for occlusion culling applications, because the cost function determining the scene entities is obviously not optimized with respect to the volume or the screen-space area of the BBs. We observed that the right-most branch of the scene tree frequently contained large subsets (BB volume size) of the model, even in the lower tree levels. @> !? #A DB :C ,E 0F HG ? IA JB :C 9K /L M 3.5 Evaluating the Scene Organization Quality In this section, we discuss the efficiency of the decomposition algorithms with respect to their occlusion culling-based rendering performance. All measurements are performed on an HP B180/fx4 graphics workstation. The different 12 polygonal datasets (see Table 2) represent typical scenarios of different application areas. Their scene trees only contain individual polygons (triangles) in order to evaluate a comparable scenes. Two of the datasets are examined in more detail. Grid Type Source #Triangles Frame-rates/[fps] No Culling p-HBVO OPT ORSD Render-rates/[%] p-HBVO OPT ORSD Occlusion Time/[ms] p-HBVO OPT ORSD Ventricular System Uniform MRI 270,882 Cathedral Unstructured CAD 416,763 City Unstructured Modeler 1,408,152 4.6 12.3 13.6 15.3 3.8 12.4 7.8 n.a. 0.9 14.0 11.8 n.a. 16.0 19.7 22.4 30.0 30.1 n.a. 0.1 3.6 n.a. 30 16 11 4 54 n.a. 27 33 n.a. Table 2. Scene overviews; as gold standard for the speed-up due to occlusion culling, we show the frame-rate and render-rates for the datasets with and without culling (view-frustum and occlusion). ORSD requires MarchingCubes-generated scenes, which are available only for the ventricular system. Occlusion time lists the costs required for occlusion culling (no view-frustum culling). Scene trees of different decomposition granularities are evaluated for the respective costs of view-frustum culling, occlusion culling, and rendering (in frame-rate) of the not occluded geometry. Here, we present only culling and rendering performance on the scene trees with the best rendering performance. Culling performance is expressed as render-rate, which gives the percentage of the geometry determined not occluded from the total scene geometry. More detailed information on the evaluation can be found in Meißner et al., 1999. Cathedral Dataset: This dataset represents the interior of a single gothic cathedral (in contrast to Section 2.3), designed with a CAD system (see Table 2). Occlusion is limited to small parts of the model, because a large share of the polygons are visible from most view points within the model. Figure 3a shows a very fine granular decomposition of the cathedral model using the p-HBVO approach which adapts very nicely to the structures of the model, such as pillars and arcs. In contrast, the decomposition generated by OPT (b) introduces very large BBs, which do not adapt properly to the actual geometry. 13 Efficient Occlusion Culling for Large Model Visualization (a) (b) (c) (d) Figure 3. Cathedral and ventricular system BB hierarchies generated by (a,d) p-HBVO, (b) OPT, (c) ORSD/OPT; the arts and pillars of the cathedral are well detected by p-HBVO (a); OPT only used a regular spatial decomposition (b). ORSD and OPT however, generated identical results for the ventricular system dataset (c). The p-HBVO approach performed best on this dataset (see gnuplots on CDROM). This is due to the low culling costs, compared to OPT (see Table 2 and also Meißner et al., 1999). The BBs of OPT require a significantly higher time for occlusion culling compared to p-HBVO, which reduced the frame-rate severely. This effect is reduced, however, due to camera path located completely within the cathedral, thus limiting potential occlusion anyway. Ventricular System Dataset: The first dataset is a polygonal model of the isosurface representing the ventricular system of the human brain extracted from an MRI scan. We explore the dataset by moving through the lower part (Cisterna Magna) of the polygonal model. Most of the model structures through-out the walk-through are located within the view-frustum, while the structures with the largest number of polygons (located in the upper part or lateral ventricles) were not visible due to occlusion. All polygons of this model are aligned on the uniform cell grid and are of approximately the same size. All three adapted algorithms were able to detect this “natural” decomposition boundaries; only OPT generated approximately 15% additional polygons due to splitting operation between the grid points. Table 2 (and gnuplots on CDROM) shows frame-rate and render-rate of the evaluated algorithms. The most interesting detail is the low amount of time consumed by occlusion culling by ORSD, due to its coarse decomposition. The render-rate of p-HBVO was approximately 25% better than the renderrate of the ORSD approach. However, the finer decomposition (see Fig. 3) introduced additional culling costs twice as much as for ORSD, resulting in a lower frame-rate (see Table 2. 14 Summary Overall, the adapted scene organization approaches were able to generate decompositions with faster rendering due to higher cull performance. This was achieved by reducing culling costs or by reducing the render-rate of the dataset. On uniform grid datasets, the basic ORSD approach produced a model subdivision which performed best, mostly due to the low time spent to establish occlusion or non-occlusion. Generally, we observed that models with high occlusion do not require very fine decomposition (ventricular system dataset, p-HBVO vs. ORSD). On the other hand, a fine decomposition pays off if interior (thus completely occluded) objects are clustered in a scene entity (city dataset, p-HBVO vs. OPT). In contrast, models with low occlusion (cathedral dataset) can benefit from finer decompositions, if the culling costs loss is only a fraction of the rendering costs gain (see city dataset, p-HBVO). Note that the p-HBVO approach builds a binary scene tree. This usually results in deeper trees, hence more intermediate scene entities. This increases the time spent for occlusion culling significantly. Once this binary tree was re-built into a quad-tree representation, we achieved a frame-rate increase of approximately 20%. To summarize, we always achieved a speed-up due to occlusion cullingbased rendering. Especially with the highly occluded city dataset, we achieved a speed-up of 15.6 after culling of 99.9% of the model geometry with the p-HBVO algorithm. On the ventricular system dataset, the ORSD approach accomplished the best results; 77.6% of the geometry was culled, due to viewfrustum and occlusion culling. This culling performance resulted in a frame-rate speed-up of 3.3. 4. Conclusions and Future Work In this chapter, we presented OpenGL-assisted occlusion culling techniques and compared several methods for the generation of a hierarchical scene organization which can significantly increase the rendering performance (in fps). Other techniques include efficient scene traversal strategies which can impact the occlusion culling efficiently severely. However, there are still many other visibility related techniques to improve the interactive visualization of large models. Most of the occlusion culling approaches use a simple axis-aligned bounding box to approximate the geometry of a scene entity. Unfortunately, this approximation frequently has a much larger screen-space area than the actual geometry, resulting in more stated visibility hits than necessary. Employing better convex hull techniques has the potential to reduce these false positive visibility hits. 15 Efficient Occlusion Culling for Large Model Visualization Furthermore, the quantization of visibility (how many scene entity pixels are visible) plays an important role in dealing with these kinds of “slightly visible” entities. Lower levels of detail of this entity could be used instead. Notes 1. The occlusion tests are performed for each scene hierarchy element separately within one frame. Therefore, an element can be occluded by an element tested/rendered later. 2. Other buffers could be used as well, but the stencil buffer, as an integer buffer, is often the least used buffer and has on some graphics systems an empirically measured better read performance than the other buffers. OQPHR0SN TVUW-X 3. Basically, this scheme implements a sampling of the virtual occlusion buffer, where th of each BB is read in each iteration. In other words, the algorithm needs sampling iterations to fully read the entire BB. References Airey, J., Rohlf, J., and Brooks, F. (1990). Towards Image Realism with Interactive Update Rates in Complex Virtual Building Environments. In Proc. of ACM Symposium on Interactive 3D Graphics. Barequet, G., Chazelle, B., Guibas, L., Mitchell, J., and Tal, A. (1996). BOXTREE: A Hierarchical Representation for Surfaces in 3D. In Proc. of Eurographics. Bartz, D., Meißner, M., and Hüttner, T. (1998). Extending Graphics Hardware for Occlusion Queries in OpenGL. In Proc. of Eurographics/SIGGRAPH Workshop on Graphics Hardware, pages 97–104,158. Bartz, D., Meißner, M., and Hüttner, T. (1999). OpenGL-assisted Occlusion Culling of Large Polygonal Models. Computers and Graphics - Special Issue on Visibility - Techniques and Applications, 23(5):667–679. Bartz, D. and Skalej, M. (1999). VIVENDI - A Virtual Ventricle Endoscopy System for Virtual Medicine. In To appear in Proc. of EG/IEEE Symposium on Visualization. Fuchs, H., Kedem, Z., and Naylor, B. (1980). On Visible Surface Generation by a Priori Tree Structures. In Proc. of ACM SIGGRAPH. Garlick, B., Baum, D., and Winget, J. (1990). Interactive Viewing of Large Geometric Databases Using Multiprocessor Graphics Workstations. In SIGGRAPH’90 course notes: Parallel Algorithms and Architectures for 3D Image Generation. Gottschalk, S., Lin, M., and Manocha, D. (1996). OBBTree: A Hierarchical Structure for Rapid Interference Detection. In Proc. of ACM SIGGRAPH. Greene, N., Kass, M., and Miller, G. (1993). Hierarchical Z-Buffer Visibility. In Proc. of ACM SIGGRAPH. Hong, L., Muraki, S., Kaufman, A., Bartz, D., and He, T. (1997). Virtual Voyage: Interactive Navigation in the Human Colon. In Proc. of ACM SIGGRAPH. Kay, T. and Kajiya, J. (1986). Ray Tracing Complex Scenes. In Proc. of ACM SIGGRAPH. Meißner, M., Bartz, D., Günther, R., and Straßer, W. (2001). Visibility Driven Rasterization. Computer Graphics Forum, 20(4):283–294. Meißner, M., Bartz, D., Hüttner, T., Müller, G., and Einighammer, J. (1999). Generation of Subdivision Hierarchies for Efficient Occlusion Culling of Large Polygonal Models. Technical Report WSI-99-13, ISSN 0946-3852, Dept. of Computer Science (WSI), University of Tübingen. 16 Müller, G. and Fellner, D. (1999). Hybrid Scene Structuring with Application to Ray Tracing. In Proc. of ICVC’99 (to appear). Scott, N., Olsen, D., and Gannett, E. (1998). An Overview of the VISUALIZE fx Graphics Accelerator Hardware. The Hewlett-Packard Journal, (May). SGI (1997). Optimizer Manual. Technical report. SGI (1999). Silicon Graphics 320, Visual Workstation. Specifiaction document, available from http://visual.sgi.com/products/320/index.html. Snyder, J. and Lengyel, J. (1998). Visibility Sorting and Compositing without Splitting for Image Layer Decompositions. In Proc. of ACM SIGGRAPH. Teller, S. and Sequin, C. (1991). Visibility Pre-processing for Interactive Walkthroughs. In Proc. of ACM SIGGRAPH. Woo, M., Neider, J., and Davis, T. (1997). OpenGL Programming Guide. Addison Wesley, Reading, Mass., 2nd edition. Zhang, H., Manocha, D., Hudson, T., and Hoff, K. E. (1997). Visibility Culling Using Hierarchical Occlusion Maps. In Proc. of ACM SIGGRAPH.
© Copyright 2025 Paperzz