bartz.pdf

EFFICIENT OCCLUSION CULLING FOR LARGE
MODEL VISUALIZATION
Dirk Bartz
WSI/GRIS
University of Tübingen
[email protected]
Michael Meißner
WSI/GRIS
University of Tübingen
[email protected]
Gordon Müller
Institute of Computer Graphics
Braunschweig University of Technology
[email protected]
Abstract
Occlusion and visibility culling is one of the major techniques to reduce the geometric complexity of large polygonal models. Since the introduction of hardware
assisted-occlusion culling in OpenGL (as an extension), solely software-based
approaches are losing relevance rapidly for applications which cannot exploit
specific knowledge of the scene geometry. However, there are still several issues
open on the software side for significant performance improvements. In this
paper, we discuss several of these techniques.
Keywords:
Large model visualization, occlusion culling, visibility, hierarchical scene organization, efficient bounding primitives
1.
Introduction
In the past few years, the size of polygonal datasets used in scientific visualization has increased rapidly. Typical model sizes range from several hundred
thousands of polygons in architectural design applications, several millions of
1
2
polygons in medical applications, multi-million polygons in mechanical engineering, to several hundred of million polygons in scientific computing. In the
mean time, the raw polygonal rendering performance of state-of-the-art computer graphics subsystems is also increasing fast. However, it is not increasing
as fast as the extra-ordinary high demands driven by scientific visualization, resulting in a growing divide between rendering requirements and raw rendering
performance.
Numerous approaches address this growing divide, such as mesh reduction,
geometry compression, parallel processing, volume rendering, and visibility
and occlusion culling. Several visibility and occlusion culling approaches were
developed in the past decade to reduce the polygonal complexity of a wide
variety of applications. Object-space oriented visibility culling algorithms address mostly problems from global illumination or computational geometry.
However, they are usually either limited to specific visibility problems, or do
not provide enough performance for rapid, or even interactive (more than 10
frames per second (fps)) explorations of large geometric objects. Here, we are
focusing on image-space occlusion culling algorithms which show good performance for general polygonal models or dynamic scenes, while object-space
approaches can only deal with a limited scene complexity.
With the introduction of graphics hardware support for occlusion culling
by Hewlett-Packard Scott et al., 1998, interactive exploration (more than 10
fps) became feasible for large polygonal models, virtually eliminating the need
for purely or mostly software-based approaches for applications with general
or dynamic scenes. Therefore, we will in particular discuss methods which
exploit this specific functionality.
2.
Occlusion Culling Approaches
Most occlusion or visibility culling algorithm rely on a hierarchical organization of the scene to check for visibility at different levels of detail. Another
common feature is an initial view-frustum check – usually in object space – to
determine the scene elements which are not intersecting with the view-frustum
and hence, not visible. Garlick Garlick et al., 1990 presented an approach which
combines these techniques for a walk-through system. Most other image space
approaches apply an additional occlusion test, to determine which parts of the
geometry are occluded by scene elements rendered previously. Most of these
approaches Greene et al., 1993; Zhang et al., 1997; Bartz et al., 1999 use
different software-based techniques to trace visual contributions of the scene
elements to determine if they are potentially visible . Greene et al. Greene
et al., 1993 traced the rasterized silhouette through a z-pyramid to determine
changes in the z-buffer, indicating a not occluded object. Zhang et al. Zhang
et al., 1997 used a hierarchical screen projected map of pre-selected occluders
Efficient Occlusion Culling for Large Model Visualization
3
to check if the scene elements are occluded. In Bartz et al., 1999, the OpenGL
stencil buffer is used as virtual occlusion buffer, which is sampled to trace for
visual contribution. We will explain this in more detail later in Section 2.3.
In contrast to these software-based approaches, Hewlett-Packard released the
VISUALIZE fx series of graphics subsystems which provide hardware support
for occlusion culling queries using the HP occlusion culling flag Scott et al.,
1998. The visibility status of a scene element can be queried by rendering
this element (or its bounding volume) in an occlusion mode which does not
contribute to the framebuffer. However, potential changes to the depth buffer
are detected, indicating visibility of that element.
Other hardware-based approaches include the Z-query in the not anymore
available Denali graphics on the Kubota Pacific workstation Greene et al., 1993
and the instrument extension on SGI’s Visual PC SGI, 1999. In another recent
proposal, Bartz et al. Bartz et al., 1998 suggest hardware modifications within
the rasterizer by adding an Occlusion Culling Unit to provide detailed visibility
information, or culling individual triangles or pixel groups beyond “traditional”
HP-flag style culling Meißner et al., 2001.
In the following, we briefly discuss efficient scene tree traversal which determines in which sequence the hierarchical scene organization is tested for
occlusion. Afterwards, we present results obtained using OpenGL for efficient
occlusion culling of complex polygonal scenes Bartz et al., 1999. In particular,
we present two methods how to use the OpenGL selection mode to perform
view-frustum culling and OpenGL’s stencil buffer as virtual occlusion buffer.
2.1
Efficient Scene Tree Traversal
The efficiency of image-space occlusion culling approaches depends on the
sequence by which occlusion is determined, since scene objects in the back of
the scene from the current view point are very unlikely to occlude other objects.
Greene et al. Greene et al., 1993 used a front-to-back sorted scene to ensure
efficiency. Zhang et al. Zhang et al., 1997 select “good occluders” from a preselected occluder database based on similar heuristics. In our previous stencil
buffer-based approach Bartz et al., 1999 (also presented later on in Section 2.3),
we perform front-to-back sorting during view-frustum culling, which involves
the transformation of the scene objects (or their bounding volumes) into the view
coordinate system. Furthermore, we also proposed an interleaved scheme where
the occlusion test is performed also on the inner nodes of the scene hierarchy
tree, once view-frustum culling has determined those as potentially visible –
and sorted these nodes front-to-back. This scheme allows the assignment of an
occlusion budget to balance rendering and culling costs.
In early HP occlusion culling flag-based approaches, a brute-force testing
on scene elements was used Scott et al., 1998, which we modified to view-
4
frustum culling of the complete scene hierarchy tree, sorting of the remaining
leaf elements, and subsequent front-to-back occlusion testing of the potentially
visible leaf elements Bartz and Skalej, 1999.
Further measurements on the visibility of leaf elements indicated additional
potential for more efficient utilization of occlusion culling for scenes with a high
depth complexity. The approximately 10% front-most leaf elements in these
scenes are not occluded in most of the cases, hence they can be rendered without
the costly determination of their visibility. Furthermore, the 40% farthest scene
elements do not significantly change the visibility information stored in the
depth buffer of the graphics system because they are already occluded. Therefore, their occlusion status can be established without taking their own occluder
potential into account, which usually is a cheaper occlusion query. However,
the distribution of the objects in a scene is very application dependent and needs
to be established individually.
2.2
View-Frustum Culling
In contrast to other published approaches, we use OpenGL to perform the
view-frustum culling step. In detail, we use the OpenGL selection mode to detect whether a bounding volume interferes with the view-frustum. This OpenGL
mode is designed to identify geometric objects rendered into a specific screen
area Woo et al., 1997, which in our case is the whole screen. The polygonal
representation of the bounding volume (as convex hull) is transformed, clipped
against the view-frustum, and finally rendered without contributing to the actual
framebuffer Woo et al., 1997. Once a bounding volume intersects the viewfrustum – the hit buffer of OpenGL’s selection mode has a contribution from
this bounding volume object – we test whether the polygonal bounding volume
resides entirely within the view-frustum. In this case, all subtrees of the bounding volume are marked potentially visible. Otherwise, we recursively continue
testing the child nodes of the bounding volume hierarchy.
In rare cases, the bounding volumes can completely contain the view-frustum
– resulting in no contributions to the hit buffer of the selection mode, due to a
not visible bounding volume representation. This can be prevented by testing
if the view point lies within the bounding volume, or if the bounding volume
lies in between the near plane of the view-frustum and the view point.
As a result of the view-frustum culling step, leaves are tagged potentially
visible, if they are not culled by the view-frustum culling, or definitely not
visible.
2.3
Virtual Occlusion Buffer-based Culling
The task of an occlusion culling algorithm is to determine occlusion of objects
in a model. We use a virtual occlusion buffer (VOB), being mapped onto
5
Efficient Occlusion Culling for Large Model Visualization
the OpenGL framebuffer to detect possible contribution of any object to the
framebuffer. In our implementation of the algorithm on an SGI O2 and an
SGI Octane/MXE, we used the stencil buffer for this purpose . Intentionally,
the stencil buffer is used for advanced rendering techniques, like multi-pass
rendering.
To test for occlusion of a node, we send the triangles of its bounding box (BB)
to the OpenGL pipeline, use the z-buffer test while scan-converting the triangles,
and redirect the output into the virtual occlusion (stencil) buffer. Occluded
bounding volumes will not contribute to the z-buffer and hence, will not generate
a footprint in the virtual occlusion buffer.
Although reading the virtual occlusion buffer is fairly fast, it is the most
costly single operation of our algorithm, which accounts for approximately
90% of the total costs of the occlusion culling stage. This is mainly due to
the time consumed for the setup getting the buffer out of the OpenGL pipeline.
For models subdivided into thousands of bounding volumes, this can lead to a
less efficient operation. Furthermore, large BBs require many read operations.
Therefore, we implemented a progressive occlusion test, which reads spans of
pixels from the virtual occlusion buffer using a double interleaving scheme.
Although, the setup time on the O2 for sampling small spans of the virtual
occlusion buffer increases the time per sample, spans of ten pixels achieved an
almost similar speed-up as sampling entire lines of the virtual occlusion buffer.
For the Octane/MXE, we need to read larger chunks from the framebuffer in
order to achieve a sufficient speed-up. Note that sampling introduces a nonconservative heuristic which trades-off quality versus rendering performance.
However, the experiments in the next section showed no apparent artifacts (even
in animations), although small image differences are present.
2.4
Analysis
We ran several experiments to evaluate the performance of the OpenGLassisted occlusion culling approach. For a complete listing of the results, please
refer to Bartz et al., 1999.
All experiments were run on four different scenes; an architectural scene of
eight gothic cathedrals – arranged on a 3D array, a city scene, a forest scene
to demonstrate quantitative culling, and – similar to Zhang et al., 1997 – the
content of a virtual garbage can of rather small objects (see also Fig. 1). An
hierarchical scene organization for each dataset was generated by manual tuning
of a scene hierarchy generated by SGI’s OpenGL optimizer (see Section 3).
Frame-rate and culling rate are measured over a sequence of about 100 frames
on an SGI O2 workstation (256 MB, R10000 @ 175 MHz), and on an SGI
Octane/MXE (896 MB, R10000 @ 250 MHz CPU). Please note that the datasets
used for the SGI O2 were using triangle strips, while this was not possible for
6
(a)
(b)
(c)
(d)
(e)
(f)
Figure 1. Scenes: (a,d) city b,e) garbage, (c,f) cathedrals; upper row: views with occlusion
culling; lower row: views with BBs of occluded scene elements.
technical reasons on the Octane.
scene
#triangles
cathedrals
city
forest
garbage
3,334,104
1,056,280
452,981
5,331,146
culling
O2
91.3%
99.8%
84.7%
96.0%
The polygonal complexity and speedculling
MXE
92.5%
87.7%
80.5%
38.2%
frame-rate
MXE/[fps]
5.3
7.7
4.7
0.3
speed-up
O2/MXE
4.2 / 12.6
4.8 / 9.8
2.6 / 6.1
7.0 / 5.0
Table 1. Average performance of OpenGL-assisted occlusion culling compared to viewfrustum culling only. Different culling rates are due to different sampling parameters.
up numbers are listed in Table 1. The costs for view-frustum and occlusion
culling vary with the different scene organization granularities; i.e., for the
cathedrals dataset, view-frustum culling accounts for approximately 5% of the
costs, occlusion culling (mostly dominated by reading the stencil buffer) for
approximately 20%, and rendering of the as visible classified scene elements
for about 75% of the total frame time. The distribution and total amount of the
costs basically depend on architecture of the graphical subsystem. A highly
interleaved graphics system like the InfiniteReality of SGI is likely to perform
Efficient Occlusion Culling for Large Model Visualization
7
worse than the O2 for this algorithm due to the high set-up costs of reading from
the framebuffer, while single-pipeline low- and mid-end graphics are mostly
limited by the total amount read from the framebuffer. Therefore, the sampling
parameters – sampling frequency and how much is read from the framebuffer
per sampling – of the occlusion culling approach trade off visual quality and
minimized set-up time and need to be parametrized for each graphics system.
3.
Hierarchical Scene Organization
In the previous section, we pointed out the need for a hierarchical scene
organization, which is difficult to derive for general polygonal models. Several
previous papers on visibility and occlusion culling touch the topic of scene
organization. While some approaches require the scene designer to provide the
scene organization Snyder and Lengyel, 1998; Zhang et al., 1997, others employ
decomposition methods which are application specific, such as a decomposition
along the skeleton of a volumetric object Hong et al., 1997, or a floor plan of a
building Airey et al., 1990; Teller and Sequin, 1991. However, these schemes
cannot be applied efficiently to general models. Models built in ComputerAided-Design (CAD) systems already include appropriate scene organization
information in a product data management system, due to the design process
which uses hierarchical notions like grouping and replication.
A more general approach is to organize a polygonal model into regular spatial
decomposition schemes, such as BSP-trees Fuchs et al., 1980 or Octrees Greene
et al., 1993. While these decomposition schemes produce good results on
polygonal models extracted by the Marching Cubes algorithm from uniform grid
volume datasets — which provide a “natural” decomposition on a Marching
Cubes cell base —, these schemes run into numerous problems on general
models. If a polygon of the model lies across a decomposition boundary, it must
be either split into several parts, in order to produce a disjunct representation
of the bounding entities, or handled in another special way. Splitting polygons
however, can increase the number of small and narrow polygons tremendously.
Significant work on model organization has been published in the field of
collision detection. Methods based on oriented bounding boxes (OBB) were
explored by Gottschalk et al. Gottschalk et al., 1996. A bottom-up approach
for the construction of a model hierarchy is suggested in Barequet et al., 1996 in
which nodes representing small parts of the geometry are “merged” into higher
hierarchy nodes.
In Bartz et al., 1999, the spatialization functionality of SGI’s OpenGL Optimizer package SGI, 1997 was used to generates scene hierarchies automatically.
However, our experience from the previous section showed that these scene hierarchies need to be tuned manually in order to get sufficient performance and
motivated the work described in this section. More detailed information on the
8
analysis and results using different decomposition techniques can be found in
Meißner et al., 1999.
3.1
Scene Organization and Occlusion Culling
Generally, a polygonal scene can be decomposed into smaller parts, where
this scene organization can be either hierarchical or non-hierarchical. We call
each part of this decomposition a scene entity. If information at different multiresolution levels are required, usually a hierarchical organization is chosen,
where different scene entities are combined into one parent entity which contains
the whole information of the associated scene entities, or only information with
less detail (a lower level-of-detail). This decomposition can be represented as
a tree which is referred to as scene tree or scene graph. This tree contains
two different kinds of nodes; scene nodes and leaf nodes. Only the leaf nodes
contains the geometry of the actual model and the BB with respect to the used
decomposition method. In contrast, a scene node does not contain any geometry
of the actual dataset; it only contains the spatial boundaries of the associated
geometry nodes, thus the scene node is the implementation of the abstract scene
entity. In the following, three different approaches to generate a scene hierarchy
starting from given models are presented.
To measure the decomposition quality, we need to reduce the degrees of freedom of the used occlusion culling approach. Therefore, we employ OpenGLassisted view-frustum culling Bartz et al., 1999 and the Hewlett-Packard occlusion culling flag Scott et al., 1998. First, view-frustum culling is performed
hierarchically on the scene tree top-down; the scene nodes of the tree are tested
for intersection with the view-frustum. All the nodes which have a non-empty
intersection are located within the view-frustum and may contain visible substructures. Consequently, we proceed with the view-frustum test, until we reach
the leaf levels of the scene tree.
In the second step, all the remaining leaf nodes are depth-sorted according
to their associated BBs and tested for occlusion using the HP occlusion culling
flag. The actual geometry of the front-most leaf node is rendered into the
empty framebuffer without testing. All further rendering is performed in an
interlocked fashion. First, we render the BB of the next-closest leaf node
in the HP occlusion mode which does not contribute to the framebuffer. If
this BB would have a visible contribution to the framebuffer, the HP occlusion
culling flag is set TRUE by the graphics hardware, and we render the associated
geometry in the standard render mode. If the BB does not have any contribution
(flag is set FALSE), this BB and all the associated geometry is not visible and
therefore it is culled.
Efficient Occlusion Culling for Large Model Visualization
9
Figure 2. Algorithm p-HBVO: First, polygons are sorted along every coordinate axis (a), then
polygons are partitioned into two sub-sets of neighboring polygons. Every decomposition is
evaluated by a cost function estimating the rendering costs (b). This evaluation is repeated for
every coordinate axis. The partition with minimal costs defines the decomposition of the scene
(c). Finally, this process is continued recursively.
3.2
Polygon-based Hierarchical Bounding Volume
Optimization (p-HBVO)
Our polygon-oriented Hierarchical Bounding Volume Optimization (p-HBVO)
method decomposes recursively a set of polygons into two smaller sets of polygons. The selection of the optimal scene decomposition is computed by minimizing a cost function approximating the expected rendering costs for all potential decompositions. At each decomposition level, the individual polygons are
assigned to exactly one scene entity of that level. Consequently, no polygons
are split, hence no new polygons are generated by this method.
First, the polygons are sorted along all coordinate axes where the barycenter
of each polygon serves as sorting key. Based on these three ordered lists, we
evaluate the potential binary decomposition sets along each axis for each entry
in the respective list by splitting the sorted list of polygons into a left and right
part. While most other decomposition schemes – ie., the median cut scheme Kay
and Kajiya, 1986 – use a pre-defined decomposition position, we evaluate for
each possible decomposition set a cost function which approximates the costs of
rendering the tree node which holds the decomposition sets as direct child nodes,
including the costs of visibility tests using their respective bounding volumes
(Fig. 2). The decomposition process is repeated recursively on the two new
scene entities, one containing all left polygons, the other one contains all right
polygons. The recursions terminates when either the number of polygons, or
the scene depth exceeds one of the two pre-defined parameters of the maximum
number of triangles per entity, or the maximum tree depth. These parameters
are specified by the user and supplied at the start of the decomposition process.
10
This cost function is identical to one which has already been successfully
applied in ray tracing environments Müller and Fellner, 1999, since the objective is the same; both algorithms traverse the scene graph in a similar way to
determine visibility. The costs of a scene entity , with children
and
, is given by:
! #"%$
where
& " " ')( *
(+ & ,-" " '.( (
&
&
( ( is the number of polygons within hierarchy ,
& " the surface area of the BB associated to sub-scene /0!132547698:6;=< .
, and
By combining two (three) binary tree levels to a quad-tree (octree), we can
reduce the tree depth which in turn decreases the number of inner nodes occlusion tests. Overall, this algorithm generates well balanced scene trees with
respect to their polygon load. Furthermore, polygons of individual objects are
detected and clustered together. Optimal culling performance was achieved
with finer decompositions (implemented by merging neighboring tree levels),
which usually leads to higher culling costs, and hence lower rendering rates.
3.3
Octree-based Regular Space Decomposition (ORSD)
While the previous approach is able to handle arbitrary sets of polygons,
some data sources inherently generate regular decompositions which can be
exploited at much lower cost. Uniform gridded datasets of MarchingCubes
isosurfaces from MRI or CT scanners (i.e., ventricular system dataset), consists
of a set of sample values (voxels), arranged on an uniform grid. A cube of eight
neighboring voxels is called a cell, where cells with a non-empty intersection
with the selected isosurface are called relevant cells.
The Octree-based Regular Space Decomposition method (ORSD) uses a cellbased (voxel-based) evaluation criterion, where the number of relevant cells
(relevant cell load or RCL) controls the decomposition process. This criterion
is only a rough approximation of the actual number of extracted polygons,
considering that each relevant cell represents between one and five triangles of
the isosurface. In our experiences however, RCL turned out to be sufficiently
accurate (Figure 3c visualizes one of the generated scene trees, where the drawn
BBs are bounding the actual geometry, not the respective octant volume).
After the construction of the entire octree, the RCL of each octant is already
calculated. Subsequently, the octree is traversed recursively, starting with the
superblock. If the RCL is above a user-specified threshold, the block is classified
Efficient Occlusion Culling for Large Model Visualization
11
as a scene node, thus being further decomposited into its child blocks. In the
other case, the block is considered as a leaf node. The associated relevant cells
— at the bottom level of the octree — contain the polygons of the isosurface,
which are assigned to that leaf node. This decomposition process results in
a hierarchy of blocks, where the leaf nodes contain the actual geometry (see
Section 3.1).
Overall, ORSD is a simple but efficient decomposition scheme which generates an adequate polygon load balance and BB sizes. As shown in the results,
the indirect evaluation method (RCL instead of number of polygons) does not
adversely affect the occlusion culling performance. However, ORSD is limited
to uniform grid datasets.
3.4
SGI’s OpenGL Optimizer (OPT)
SGI’s OpenGL Optimizer (OPT) is a C++ toolkit for CAD applications that
provides scene graph functionality for handling and visualization of large polygonal scenes. The decomposition method realized in OPT is similar to the construction of an octree; each scene entity is split into eight equally sized scene
entities. This process is repeated recursively, until a certain threshold criterion
for the iterated decomposition is reached. Octree-based spatial decomposition
is a simple and efficient scheme. However, the OPT scene organization mechanism decomposes space not by simply bisecting edges of a cube, as in an octree,
but by choosing decomposition planes so that the rendering loads of the resulting parts are similar. As a result, the amount of geometry in each scene entity on
each side of the cutting plane is approximately the same. Polygons which are
split due to the decomposition are distributed to the respective scene entities.
The main parameters that can be used to control the decomposition are hints for
the lowest and highest amount of triangles
in each scene entity
at the leaf-level of the scene hierarchy. However, the decomposition algorithm
only tries to meet these criterion but is not bound to it.
In general, OPT generates scene hierarchies with a well-balanced polygon
load. However, the BBs of the scene entities are less suited for occlusion
culling applications, because the cost function determining the scene entities is
obviously not optimized with respect to the volume or the screen-space area of
the BBs. We observed that the right-most branch of the scene tree frequently
contained large subsets (BB volume size) of the model, even in the lower tree
levels.
@> !? #A DB :C ,E 0F HG ? IA JB :C 9K /L M
3.5
Evaluating the Scene Organization Quality
In this section, we discuss the efficiency of the decomposition algorithms with
respect to their occlusion culling-based rendering performance. All measurements are performed on an HP B180/fx4 graphics workstation. The different
12
polygonal datasets (see Table 2) represent typical scenarios of different application areas. Their scene trees only contain individual polygons (triangles) in
order to evaluate a comparable scenes. Two of the datasets are examined in
more detail.
Grid Type
Source
#Triangles
Frame-rates/[fps]
No Culling
p-HBVO
OPT
ORSD
Render-rates/[%]
p-HBVO
OPT
ORSD
Occlusion Time/[ms]
p-HBVO
OPT
ORSD
Ventricular System
Uniform
MRI
270,882
Cathedral
Unstructured
CAD
416,763
City
Unstructured
Modeler
1,408,152
4.6
12.3
13.6
15.3
3.8
12.4
7.8
n.a.
0.9
14.0
11.8
n.a.
16.0
19.7
22.4
30.0
30.1
n.a.
0.1
3.6
n.a.
30
16
11
4
54
n.a.
27
33
n.a.
Table 2. Scene overviews; as gold standard for the speed-up due to occlusion culling, we
show the frame-rate and render-rates for the datasets with and without culling (view-frustum and
occlusion). ORSD requires MarchingCubes-generated scenes, which are available only for the
ventricular system. Occlusion time lists the costs required for occlusion culling (no view-frustum
culling).
Scene trees of different decomposition granularities are evaluated for the
respective costs of view-frustum culling, occlusion culling, and rendering (in
frame-rate) of the not occluded geometry. Here, we present only culling and
rendering performance on the scene trees with the best rendering performance.
Culling performance is expressed as render-rate, which gives the percentage of
the geometry determined not occluded from the total scene geometry. More
detailed information on the evaluation can be found in Meißner et al., 1999.
Cathedral Dataset: This dataset represents the interior of a single gothic
cathedral (in contrast to Section 2.3), designed with a CAD system (see Table 2).
Occlusion is limited to small parts of the model, because a large share of the
polygons are visible from most view points within the model. Figure 3a shows
a very fine granular decomposition of the cathedral model using the p-HBVO
approach which adapts very nicely to the structures of the model, such as pillars
and arcs. In contrast, the decomposition generated by OPT (b) introduces very
large BBs, which do not adapt properly to the actual geometry.
13
Efficient Occlusion Culling for Large Model Visualization
(a)
(b)
(c)
(d)
Figure 3. Cathedral and ventricular system BB hierarchies generated by (a,d) p-HBVO, (b)
OPT, (c) ORSD/OPT; the arts and pillars of the cathedral are well detected by p-HBVO (a); OPT
only used a regular spatial decomposition (b). ORSD and OPT however, generated identical
results for the ventricular system dataset (c).
The p-HBVO approach performed best on this dataset (see gnuplots on
CDROM). This is due to the low culling costs, compared to OPT (see Table 2
and also Meißner et al., 1999). The BBs of OPT require a significantly higher
time for occlusion culling compared to p-HBVO, which reduced the frame-rate
severely. This effect is reduced, however, due to camera path located completely
within the cathedral, thus limiting potential occlusion anyway.
Ventricular System Dataset: The first dataset is a polygonal model of the
isosurface representing the ventricular system of the human brain extracted from
an MRI scan. We explore the dataset by moving through the lower part (Cisterna
Magna) of the polygonal model. Most of the model structures through-out the
walk-through are located within the view-frustum, while the structures with
the largest number of polygons (located in the upper part or lateral ventricles)
were not visible due to occlusion. All polygons of this model are aligned
on the uniform cell grid and are of approximately the same size. All three
adapted algorithms were able to detect this “natural” decomposition boundaries;
only OPT generated approximately 15% additional polygons due to splitting
operation between the grid points.
Table 2 (and gnuplots on CDROM) shows frame-rate and render-rate of the
evaluated algorithms. The most interesting detail is the low amount of time
consumed by occlusion culling by ORSD, due to its coarse decomposition.
The render-rate of p-HBVO was approximately 25% better than the renderrate of the ORSD approach. However, the finer decomposition (see Fig. 3)
introduced additional culling costs twice as much as for ORSD, resulting in a
lower frame-rate (see Table 2.
14
Summary
Overall, the adapted scene organization approaches were able to generate
decompositions with faster rendering due to higher cull performance. This
was achieved by reducing culling costs or by reducing the render-rate of the
dataset. On uniform grid datasets, the basic ORSD approach produced a model
subdivision which performed best, mostly due to the low time spent to establish
occlusion or non-occlusion.
Generally, we observed that models with high occlusion do not require very
fine decomposition (ventricular system dataset, p-HBVO vs. ORSD). On the
other hand, a fine decomposition pays off if interior (thus completely occluded)
objects are clustered in a scene entity (city dataset, p-HBVO vs. OPT). In
contrast, models with low occlusion (cathedral dataset) can benefit from finer
decompositions, if the culling costs loss is only a fraction of the rendering costs
gain (see city dataset, p-HBVO).
Note that the p-HBVO approach builds a binary scene tree. This usually
results in deeper trees, hence more intermediate scene entities. This increases
the time spent for occlusion culling significantly. Once this binary tree was
re-built into a quad-tree representation, we achieved a frame-rate increase of
approximately 20%.
To summarize, we always achieved a speed-up due to occlusion cullingbased rendering. Especially with the highly occluded city dataset, we achieved
a speed-up of 15.6 after culling of 99.9% of the model geometry with the
p-HBVO algorithm. On the ventricular system dataset, the ORSD approach
accomplished the best results; 77.6% of the geometry was culled, due to viewfrustum and occlusion culling. This culling performance resulted in a frame-rate
speed-up of 3.3.
4.
Conclusions and Future Work
In this chapter, we presented OpenGL-assisted occlusion culling techniques
and compared several methods for the generation of a hierarchical scene organization which can significantly increase the rendering performance (in fps).
Other techniques include efficient scene traversal strategies which can impact
the occlusion culling efficiently severely.
However, there are still many other visibility related techniques to improve
the interactive visualization of large models. Most of the occlusion culling
approaches use a simple axis-aligned bounding box to approximate the geometry of a scene entity. Unfortunately, this approximation frequently has a much
larger screen-space area than the actual geometry, resulting in more stated visibility hits than necessary. Employing better convex hull techniques has the
potential to reduce these false positive visibility hits.
15
Efficient Occlusion Culling for Large Model Visualization
Furthermore, the quantization of visibility (how many scene entity pixels are
visible) plays an important role in dealing with these kinds of “slightly visible”
entities. Lower levels of detail of this entity could be used instead.
Notes
1. The occlusion tests are performed for each scene hierarchy element separately within one frame.
Therefore, an element can be occluded by an element tested/rendered later.
2. Other buffers could be used as well, but the stencil buffer, as an integer buffer, is often the least used
buffer and has on some graphics systems an empirically measured better read performance than the other
buffers.
OQPHR0SN TVUW-X
3. Basically, this scheme implements a sampling of the virtual occlusion buffer, where
th of
each BB is read in each iteration. In other words, the algorithm needs sampling iterations to fully read the
entire BB.
References
Airey, J., Rohlf, J., and Brooks, F. (1990). Towards Image Realism with Interactive Update Rates
in Complex Virtual Building Environments. In Proc. of ACM Symposium on Interactive 3D
Graphics.
Barequet, G., Chazelle, B., Guibas, L., Mitchell, J., and Tal, A. (1996). BOXTREE: A Hierarchical Representation for Surfaces in 3D. In Proc. of Eurographics.
Bartz, D., Meißner, M., and Hüttner, T. (1998). Extending Graphics Hardware for Occlusion
Queries in OpenGL. In Proc. of Eurographics/SIGGRAPH Workshop on Graphics Hardware,
pages 97–104,158.
Bartz, D., Meißner, M., and Hüttner, T. (1999). OpenGL-assisted Occlusion Culling of Large
Polygonal Models. Computers and Graphics - Special Issue on Visibility - Techniques and
Applications, 23(5):667–679.
Bartz, D. and Skalej, M. (1999). VIVENDI - A Virtual Ventricle Endoscopy System for Virtual
Medicine. In To appear in Proc. of EG/IEEE Symposium on Visualization.
Fuchs, H., Kedem, Z., and Naylor, B. (1980). On Visible Surface Generation by a Priori Tree
Structures. In Proc. of ACM SIGGRAPH.
Garlick, B., Baum, D., and Winget, J. (1990). Interactive Viewing of Large Geometric Databases
Using Multiprocessor Graphics Workstations. In SIGGRAPH’90 course notes: Parallel Algorithms and Architectures for 3D Image Generation.
Gottschalk, S., Lin, M., and Manocha, D. (1996). OBBTree: A Hierarchical Structure for Rapid
Interference Detection. In Proc. of ACM SIGGRAPH.
Greene, N., Kass, M., and Miller, G. (1993). Hierarchical Z-Buffer Visibility. In Proc. of ACM
SIGGRAPH.
Hong, L., Muraki, S., Kaufman, A., Bartz, D., and He, T. (1997). Virtual Voyage: Interactive
Navigation in the Human Colon. In Proc. of ACM SIGGRAPH.
Kay, T. and Kajiya, J. (1986). Ray Tracing Complex Scenes. In Proc. of ACM SIGGRAPH.
Meißner, M., Bartz, D., Günther, R., and Straßer, W. (2001). Visibility Driven Rasterization.
Computer Graphics Forum, 20(4):283–294.
Meißner, M., Bartz, D., Hüttner, T., Müller, G., and Einighammer, J. (1999). Generation of
Subdivision Hierarchies for Efficient Occlusion Culling of Large Polygonal Models. Technical Report WSI-99-13, ISSN 0946-3852, Dept. of Computer Science (WSI), University of
Tübingen.
16
Müller, G. and Fellner, D. (1999). Hybrid Scene Structuring with Application to Ray Tracing.
In Proc. of ICVC’99 (to appear).
Scott, N., Olsen, D., and Gannett, E. (1998). An Overview of the VISUALIZE fx Graphics
Accelerator Hardware. The Hewlett-Packard Journal, (May).
SGI (1997). Optimizer Manual. Technical report.
SGI (1999). Silicon Graphics 320, Visual Workstation. Specifiaction document, available from
http://visual.sgi.com/products/320/index.html.
Snyder, J. and Lengyel, J. (1998). Visibility Sorting and Compositing without Splitting for Image
Layer Decompositions. In Proc. of ACM SIGGRAPH.
Teller, S. and Sequin, C. (1991). Visibility Pre-processing for Interactive Walkthroughs. In Proc.
of ACM SIGGRAPH.
Woo, M., Neider, J., and Davis, T. (1997). OpenGL Programming Guide. Addison Wesley,
Reading, Mass., 2nd edition.
Zhang, H., Manocha, D., Hudson, T., and Hoff, K. E. (1997). Visibility Culling Using Hierarchical
Occlusion Maps. In Proc. of ACM SIGGRAPH.