Journal of Computer Graphics Techniques Vol. 5, No. 2, 2016
http://jcgt.org
An N-ary BVH Child Node Sorting Technique
for Occlusion Tests
Shinji Ogaki
OLM Digital, Inc.
Alexandre Derouet-Jourdan
OLM Digital, Inc. / JST CREST
Figure 1. The heat maps show the sum of the numbers of node traversal steps and primitive
intersection tests before (middle) and after (right) optimization using our technique. The
warm color shows where more nodes and primitives are tested. We achieve 1.2 times faster
rendering. The scene contains roughly 6 million primitives.
Abstract
The cost of occlusion tests can be reduced by changing traversal order. In this paper, we
introduce a very simple, yet general, cost model and an efficient algorithm to determine the
optimal order of the child nodes of n-ary bounding volume hierarchies (BVH). Our cost model
is derived by extending the shadow ray distribution heuristic from only binary BVH to n-ary
BVH, and our algorithm does not require an extra BVH for shadow rays.
1.
Introduction
Occlusion tests are a large part of the process in ray-tracing algorithms. This is especially true when soft shadows are generated by an area light source or a skydome
light. Unlike a radiance ray, shadow ray traversal can be terminated as soon as an
occluder is found, and, therefore, the traversal order has a great impact on shadow
computation.
In the industry, many commercial renderers adopt Embree [Wald et al. 2014],
which is known to be the fastest ray-tracing kernel for CPUs. The n-ary bounding
volume hierarchy (BVH) has become a dominant data structure for renderers, including those using Embree, and CPU ray tracing in general.
22
ISSN 2331-7418
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
Most existing algorithms to determine shadow ray traversal order were developed
only for the binary BVH. These include the ray termination surface area heuristic
(RTSAH) [Ize and Hansen 2011], surface area traversal order (SATO) [Nah and
Manocha 2014], and the shadow ray distribution heuristic (SRDH) [Feltman et al.
2012]. When it is possible to use such techniques as a cost model for n-ary BVH,
these techniques suffer from being computationally expensive or suboptimal. This is
primarily due to the factorial nature of finding a traversal order for n-ary BVH.
In this paper, we introduce a new cost model for shadow ray traversal of n-ary
BVH. The model is derived by extending SRDH to work for an arbitrary number of
child nodes. Our algorithm starts by casting a small number of representative rays to
approximate the probabilities of the proposed cost model. Then, the child nodes are
sorted in a bottom-up fashion, and we continue rendering with the optimized tree. We
define an ordering of the children that minimizes the new cost model and are able to
prove that the ordering is optimal.
In order to reduce memory consumption and initialization cost, we do not build an
extra BVH optimized for shadow rays. Nevertheless, we found significant improvements in the time required for shadow ray traversal.
1.1.
Related Work
Ray-tracing performance is determined by the quality of a BVH and ray traversal
order. The surface area heuristic (SAH) is the most common cost metric to build a
high quality BVH. In SAH, the probability of visiting each node is approximated by
the surface area ratio of a node to that of its parent node.
Generally, rays are traversed in a depth-first order. When a ray hits child nodes
of an n-ary BVH node, they are sorted in a front-to-back order and then pushed onto
a stack [Áfra 2013; Wald et al. 2014]. This gives an ideal traversal order for a BVH
with spatial splits [Stich et al. 2009]. On the other hand, when object splits are used,
the closest node can be found by a strict front-to-back traversal order [Wald et al.
2008]. This can reduce the numbers of node traversal steps and primitive intersection
tests and, therefore, exhibits a better performance for primitives requiring complex
intersection tests. However, for simple primitives such as a triangle, selecting the
closest node from all intersected nodes may adversely affect ray-tracing performance
because of the overhead.
Ray distribution heuristics (RDH) [Bittner and Havran 2009] construct a BVH by
approximating the probability of visiting each node using the distribution of representative rays. However, this method only gives a subtle speedup, because radiance
rays have a tendency to distribute uniformly, as assumed in SAH. This ray distribution is later exploited in another approach, the shadow ray distribution heuristic
(SRDH) [Feltman et al. 2012], to accelerate occlusion tests by limiting the use of representative rays to shadow rays. Using the distribution of representative rays, another
23
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
BVH is constructed for occlusion tests. The order of visiting child nodes is chosen to
minimize the SRDH cost function. The performance gain is large, but it comes at the
cost of extra memory and BVH construction time.
Ize et al. introduced a cost metric suitable for occlusion tests called ray termination SAH (RTSAH) [Ize and Hansen 2011]. This is derived by extending SAH
utilizing the visibility of each node. The shadow ray traversal order is determined to
minimize the RTSAH cost function. The difference compared to SRDH is that the
distribution of shadow rays is assumed to be uniform. Therefore, no representative
rays are required. The cost is calculated precisely based on the fact that the probability of hitting child nodes are not independent. However, this method to evaluate
the probabilities makes it difficult to extend to n-ary BVH. Although the improvements shown in the paper are significant, it does not always improve the performance,
because shadow rays tend to have strong spatial or directional coherence.
The visibility-driven BVH proposed by Vinkler et al. [Vinkler et al. 2012] takes
into account visibility in the cost model. Exploiting approximated visibility of triangles, a BVH optimized for a visibility test is constructed. This idea could be used to
accelerate shadow ray traversal similar to SRDH. To do so, extra memory is required
to create an additional data structure as done in [Feltman et al. 2012].
Surface area traversal order (SATO) [Nah and Manocha 2014] determines the
traversal order based solely on the surface areas of nodes and primitives. Since large
primitives tend to be located close to the root node when a BVH is built using SAH,
this algorithm traverses large nodes and primitives first to quickly identify large occluders. The preprocessing is very fast and dynamic scenes can be handled well;
extending this to n-ary BVH is trivial. However, as reported in [Nah and Manocha
2014], it does not necessarily accelerate occlusion tests because the distribution of
occlusion rays is not taken into account. If there are large primitives that do not cast
shadows, SATO can perform poorly.
Recently an interesting contraction technique was introduced to accelerate ray
tracing for n-ary BVH [Gu et al. 2015]. This method creates an n-ary BVH by contracting binary BVH nodes which are often visited, thus resulting in a 30% acceleration for shadow rays. However, the order of child nodes is simply determined by the
visited times without considering the cost of each subtree.
Embree [Wald et al. 2014] uses a unique traversal order. If a ray intersects multiple child nodes, they are pushed onto a stack without sorting to avoid large sorting
cost except when two nodes are intersected. Sorting two nodes is done with a single
swap operation; the closer one is traversed first when only two nodes are intersected.
1.2.
Notation
For convenience, a summary of the notation used is given in Table 1. As we are
traversing a BVH (binary or n-ary) with a ray, we denote current node as the node
24
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
N
Np
N
Li
Ni
Cb
Cmb
Cp
Cl→r
Cr→l
I
I l , Ir
Ilr
Ijl , Ijr
Hl , Hr
V l , Vr
Ii
Hi
Vol. 5, No. 2, 2016
http://jcgt.org
Numbers
child nodes (SIMD width)
primitives in leaf node
Recorded Numbers
representative rays that intersect the current node
representative rays that intersect any primitive in the ith subtree
representative rays that intersect the ith child node
Costs
ray-box intersection test
ray-multi box intersection test
ray-primitive intersection test
left node visited first
right node visited first
Probabilities
probability of entering the current node
probability of entering the left (respectively, right) child node
probability of entering both child nodes
probability of entering only the left (respectively, right) child node
probability of intersecting a primitive in the left
(respectively, right) subtree
probability of entering the left (respectively, right) child node
without intersecting any primitive
probability of entering the ith child node
probability of intersecting a primitive in the ith subtree
Table 1. Notation used in this paper.
we are traversing, before or after testing that the ray intersects the bounding box.
For a given ray, the probability of entering the current node, that is, intersecting its
bounding box, is denoted I with the assumption that the ray first entered the parent
node (which is always the case in a standard BVH traversal). In the rest of the paper,
unless stated otherwise, we assume that the probabilities of entering a node assume
that the ray previously entered its parent node.
2.
Cost of Radiance Ray Traversal
The cost of radiance ray traversal is described by using the surface area heuristic. For
n-ary BVH, it is recursively defined as
(
P
Cmb + A1 N
i=1 Ai Ci for internal nodes
Csah =
,
C p Np
for leaf nodes
where A is the surface area of the bounding box of the current node and Ai is that of
the ith child node.
25
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
3.
Vol. 5, No. 2, 2016
http://jcgt.org
Cost of Shadow Ray Traversal
The surface area heuristic assumes that ray distribution is uniform. This is a reasonable assumption because rays used for indirect illumination computation are uniformly distributed. However, there is still room for improvement for occlusion tests
for which the traversal can be immediately terminated when a ray finds an occluder.
Thus, the order of child nodes has a large impact on rendering performance. The cost
of occlusion tests is recursively determined by
(
P
Qi−1
Cmb + N
i=1 Ii Ci
j=1 (1 − Hj ) for internal nodes
Cany =
,
(1)
C p Np
for leaf nodes
where Ii and Hi , respectively, denote the probability of a ray intersecting the bounding box of child i, respectively, the probability of a ray intersecting a primitive in the
child i. Here we assume the probabilities Ii and Hi are independent to simplify the
cost model. Sorting child nodes by Hi /(Ii Ci ) minimizes the cost (See Appendix A).
This is the key to our algorithm. Note that sorting does not affect the closest hit test
because it uses a front-to-back traversal order and SAH remains the same.
3.1.
Relationships to Other Models
In the following, we describe the relationship between our cost model and others. In
particular, we show how our model extends the SRDH model and how it differs from
the RTSAH model by a few assumptions on the probabilities of ray intersections.
3.1.1.
SRDH
Our cost model can be regarded as a straightforward extension of SRDH. In a binary
BVH, given an internal node and a shadow ray, there are two ways of visiting the
subtree rooted at the node when the shadow ray enters it: we first go to the left child
then the right child, or we go to the right child then to the left child. In the SRDH
model, given an internal node and a shadow ray (that may not intersect the given
node), the two costs associated with visiting the subtree rooted at the node are defined
as
Cl→r = Cb + I(Ck + Cl + (1 − Hl )Cr ),
Cr→l = Cb + I(Ck + Cr + (1 − Hr )Cl ),
where Ck is the cost of kernel execution that determines which node to traverse first,
Cb the cost of ray-box intersection, and I the probability of entering the given internal
node. SRDH builds an additional BVH optimized for the any-hit test based on the
above cost model. The differences in our model are that the kernel execution cost, Ck ,
does not exist, because traversal order is not determined at run time (see Section 4),
and that the cost of the ray-box intersection test, Cb , is shared with all child nodes,
26
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
because the bounding box hit-tests are done simultaneously for all child nodes using
SIMD instructions. Thus, the cost model becomes
Cl→r = Cmb + Il Cl + (1 − Hl )Ir Cr ,
(2)
Cr→l = Cmb + Ir Cr + (1 − Hr )Il Cl .
Extending this for N children, we have
Csrdh = Cmb +
N
X
Ii Ci
i=1
i−1
Y
(1 − Hj ),
j=1
which corresponds to our cost model.
3.1.2.
RTSAH
For the internal nodes of binary BVH, the RTSAH cost model is given by
Crtsah = min(Cl→r , Cr→l ),
where
Cl→r = (1 + Il Vl + (1 − Il )) Cb + Il Cl + (Ijr + Ilr Vl )Cr ,
Cr→l = (1 + Ir Vr + (1 − Ir )) Cb + Ir Cr + (Ijl + Ilr Vr )Cl ,
(3)
and Vl (respectively, Vr ) is the probability of a ray entering the left (respectively,
right) child tree but intersecting no primitive inside, Ijl (respectively, Ijr ) is the probability of entering only the left (respectively, right) child, and Ilr is the probability
of entering both children. RTSAH assumes that probabilities of hitting child nodes
are not independent, and Ilr is computed using the radiosity formfactor, which makes
extending it to n-ary BVH complicated. When SIMD is used, the above equations are
rewritten as
Cl→r = Cmb + Il Cl + (Ijr + Ilr Vl )Cr ,
Cr→l = Cmb + Ir Cr + (Ijl + Ilr Vr )Cl .
The visibility probability, V , is defined as
(
Ijl Vl + Ijr Vr + Ilr Vl Vr + Ie
V =
0
for internal nodes
for leaf nodes
,
where
Ie = 1 − (Ijl + Ijr + Ilr ).
By relaxing the non-independent assumption i.e., approximating Ilr ≈ Il Ir , Ijl ≈
(1 − Ir )Il , and Ijr ≈ (1 − Il )Ir , we obtain
Cl→r ≈ Cmb + Il Cl + (1 − Il (1 − Vl ))Ir Cr ,
Cr→l ≈ Cmb + Ir Cr + (1 − Ir (1 − Vr ))Il Cl .
Replacing Il (1 − Vl ) with Hl and Ir (1 − Vr ) with Hr , this result corresponds to
Equation (2).
27
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
3.1.3.
Vol. 5, No. 2, 2016
http://jcgt.org
SATO
SATO is quite different from other techniques as it only uses surface area to determine
the traversal order. For internal nodes, the cost model is given by
(
Cl→r if Al > Ar
,
Csato =
Cr→l , otherwise
where Al , respectively, Ar are the surface areas of the left, respectively, right child
nodes, and the costs Cl→r , respectively, Cr→l are calculated by Equation (3). The goal
of this technique is to quickly find a large occluder allowing SATO to handle dynamic
scenes. However, the probabilities of a ray hitting a node and being occluded by a
primitive are not properly handled. Therefore, there is no guarantee that this method
always leads to a better result.
3.2.
Approximating Probabilities
The probabilities Ii and Hi of our cost model can be approximated in a variety of
ways. Here we propose two approaches.
The first approach is to approximate Ii by a constant and Hi by a recorded number
of representative rays as
Hi = L i /
N
X
Li ,
i=1
Ii = 1.0.
(4)
The probability Hi is computed as the number of rays being occluded by a primitive
in the ith subtree, Li , divided by the sum of Li s. Approximating Ii = 1.0 means that
the optimization is aiming at improving the worst case scenario where a ray hits all
child nodes. This approach has the advantage of not requiring the logging of bounding
box intersections with shadow rays, thus reducing the memory cost to the expense of
a reduced precision.
The second approach is to approximate both probabilities more precisely by using
the recorded numbers of representative rays as
Hi = Li /N ,
Ii = Ni /N .
(5)
This approach requires more memory but provides a better precision which increases
the performance. Users can trade off between memory consumption and performance
by choosing one of these two approximations.
28
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
4.
Vol. 5, No. 2, 2016
http://jcgt.org
Algorithm and Implementation
We split our algorithm into three steps. In the first stage, representative rays are cast.
Every time a shadow ray hits a node or primitive, we log the intersection by incrementing the associated counter. The second stage consists of assigning a probability
of being intersected to each node; additionally, a probability of occluding a ray is
assigned to each primitive. Finally, we sort all child nodes to minimize the proposed
cost model (in order to eliminate the kernel execution cost, we sort all child nodes only
once). After sorting child nodes, we continue rendering. The overhead of logging is
not negligible; hence, we use two kernels for the occlusion test, one for logging and
the other for the normal occlusion test.
4.1.
N-ary BVH Node Data Structure
The n-ary BVH node data structure used in our implementation is given in Listing 1.
We add two counters in the node structure for the sake of simplifying the implementation. However, they can be detached and, for static scenes, they can be released right
after sorting child nodes.
# i n c l u d e <i m m i n t r i n . h>
s t a t i c const i n t N = 8;
/ / N Floats
typedef
m256 F l o a t N ;
/ / N Bounding Boxes
s t r u c t Bounds3dN
{
/ / For Min and Max
F l o a t N BoundsX [ 2 ] ;
F l o a t N BoundsY [ 2 ] ;
F l o a t N BoundsZ [ 2 ] ;
};
/ / N-ary BVH Node
s t r u c t NodeN / / 320 Bytes
{
Bounds3dN Boxes ; / / 192
int Indices
[N ] ;
/ / 32
i n t N u m P r i m i t i v e s [N ] ;
/ / 32
/ / We add counters here for simple implementation
i n t NodeHit [N ] ; / / 32
i n t L e a f H i t [N ] ; / / 32
};
Listing 1. N-ary BVH node structure.
29
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
4.2.
Vol. 5, No. 2, 2016
http://jcgt.org
Logging
It may be expected that recording the number of shadow rays hitting not only primitives, but also nodes, leads to better performance. This is the case because we can
accurately approximate the probabilities. However, doing so requires more atomic
operations, and the performance of the logging phase will be significantly degraded
especially on a multi-socket system. We tested logging with and without atomic operations to see how this change affects our algorithm. We observed that logging without
atomic operations does not cause any significant difference in practice as shown by
Gu et al. [Gu et al. 2015].
Occlusion query can be performed from a shading point to a light source, or from
a light source to a shading point. However, neither of the approaches finds a potential
frequent occluder enclosed in other primitives. Using an all-hit test might solve this
problem but it is too expensive to perform. We thus employ a simple shuffling technique in the logging phase. We randomly shuffle child nodes intersected by a shadow
ray before they are pushed onto a stack. This introduces an additional overhead in the
logging phase, but can reduce the number of traversal steps in some situations.
Non-opaque materials can be handled easily by stochastically incrementing counters based on their opacity.
4.3.
Sorting
Since we do not want to determine the traversal order at run-time, the child nodes of
each node are sorted only once immediately after casting representative rays. Sorting
is done in a bottom-up manner. In other words, when we sort the child nodes of a
specific node, the grandchild nodes have to be already sorted. Note that sorting the
child nodes does not affect the performance of the closest hit test because SAH is not
affected, and closest hit tests use a front-to-back traversal.
5.
Results
We implemented the proposed technique in PBRT version 3. All measurements were
R CoreTM i7-4850HQ running at 2.3GHz with Turbo
done on a laptop with Intel Boost disabled. We construct a binary BVH with the builder provided in PBRT-v3
which uses object splitting, and then an n-ary BVH is created by collapsing with a
surface area contraction technique [Wald et al. 2008; Gu et al. 2015].
We compared our algorithm to a simple depth first traversal order ( Simple), frontto-back traversal order ( FtoB), and surface area traversal order ( SATO). The Simple
traversal routine traverses intersected child nodes in the order in which they lined up.
We use three different methods in the logging and sorting phases:
• the poor approximation using Equation (4) (Sort-),
• the precise approximation using Equation (5) (Sort),
• and the precise approximation plus shuffling in the logging phase (Sort+).
30
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
city
room
Vol. 5, No. 2, 2016
http://jcgt.org
villa
28.0%
11.8%
44.2%
sponza
sanmiguel
sibenik
51.7%
93.9%
52.1%
Figure 2. Test scenes. The bottom row shows the percentages of shadow rays over all
ray types.
We did not compare our algorithm with SRDH and RTSAH, because our model
is equivalent to SRDH when applied to binary BVH as shown in Section 3.1.2, and
an n-ary BVH extension of RTSAH does not exist. One way of utilizing the SRDH
and RTSAH cost model for n-ary BVH is to create a binary treelet from a given n-ary
BVH node and determine the order of child nodes. However, the overhead could be
large, and there is no guarantee that we can find the best order with this approach for
an arbitrary N .
We tested six scenes as shown in Figure 2. We selected ”path” (standard path
tracing) for the integrator and ”random” for the sampler in the .pbrt scene files. The
maximum length of a path is set as 2 for the San-Miguel scene and 5 for the rest of
the scenes. The number of representative rays is 1% of the shadow rays used for the
whole rendering process. The cost of the multi-bounding box test Cmb is set as 1.
The results with QBVH (N = 4) and OBVH (N = 8) are shown in Tables 2 and 3,
respectively.
For all the scenes and all algorithms, we measure the number of bounding box
intersection tests per shadow ray (Nb /shadow ray) and the number of primitives intersection tests per shadow ray (Nt /shadow ray) as well as the rendering times. The
number of intersection tests of the San-Miguel scene is smaller than those of the other
scenes. This is because this scene uses multiple n-ary BVHs and the tables only show
the statistics of the top level BVH. For our model, we also separate the pre-sorting
phase and the post-sorting phase. The lowest values in the same row are highlighted
in bold.
The result of the San-Miguel scene shows that our algorithm can dramatically
improve the rendering performance when shadow rays account for a large percentage
of rays. Our method with shuffling runs up to 1.2 times faster compared to the Simple
31
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
city
N b/shadow ray before sorting
N tshadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
17.10
0.81
311
FtoB
16.02
0.82
313
SATO
15.82
0.80
305
Sort17.11
0.81
15.30
0.80
4.1
17
307
Sort
17.11
0.81
15.26
0.80
4.5
20
307
Sort+
17.74
0.85
15.17
0.80
7.6
23
310
room
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
6.90
3.24
234
FtoB
6.82
3.25
238
SATO
4.97
4.87
235
Sort6.91
3.25
4.53
3.39
3.6
0
232
Sort
6.91
3.25
4.55
3.32
3.6
0
231
Sort+
5.87
3.82
4.53
3.41
3.6
0
233
villa
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
11.52
2.09
805
FtoB
11.59
2.01
814
SATO
10.60
2.28
790
Sort11.54
2.09
8.78
1.84
9.6
42
776
Sort
11.54
2.09
8.26
1.70
9.6
58
769
Sort+
9.83
2.14
7.10
1.70
9.7
59
762
sponza
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
11.53
2.31
913
FtoB
12.61
2.74
972
SATO
10.25
2.06
899
Sort11.57
2.30
9.47
1.97
10.6
3
879
Sort
11.57
2.30
9.56
1.94
10.6
3
881
Sort+
11.40
2.31
8.97
1.76
11.6
2
864
sanmiguel
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
9.23
1.98
3093
FtoB
8.82
2.10
3109
SATO
8.74
1.71
2901
Sort9.23
1.98
8.42
1.77
32.8
41
2735
Sort
9.23
1.98
8.38
1.74
33.8
59
2725
Sort+
9.43
2.14
8.28
1.75
34.9
54
2642
sibenik
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
20.62
4.23
1272
FtoB
20.81
3.67
1285
SATO
22.23
3.49
1268
Sort20.63
4.24
19.62
3.87
14.2
3
1210
Sort
20.63
4.24
19.71
3.67
25.3
3
1257
Sort+
21.29
4.09
19.13
3.38
16.2
4
1205
Table 2. The results using QBVH.
traversal for OBVH. Note that speedup is small if radiance rays are dominant for the
scenes; however, our method always reduces Nb /shadow ray and Nt /shadow ray
after sorting and consistently performs better than the other traversal techniques.
32
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
city ”path”
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
10.82
0.81
280
FtoB
11.81
0.89
286
SATO
9.99
0.80
276
Sort10.82
0.81
9.43
0.80
4.1
24
278
Sort
10.82
0.81
9.48
0.80
4.1
21
274
Sort+
11.18
0.85
9.40
0.80
4.1
29
274
room
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
4.06
3.40
229
FtoB
4.90
2.96
229
SATO
3.44
3.86
230
Sort4.06
3.96
3.22
3.70
3.6
0
231
Sort
4.06
3.96
3.26
2.39
3.6
0
227
Sort+
4.01
3.66
3.23
2.49
3.5
0
228
villa
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
5.40
2.18
760
FtoB
7.12
2.14
793
SATO
6.04
2.24
757
Sort5.40
2.18
4.28
1.98
13.2
75
758
Sort
5.40
2.18
4.37
1.77
9.1
93
750
Sort+
6.36
2.11
4.70
1.72
9.6
91
751
sponza
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
8.27
2.45
881
FtoB
8.97
2.80
921
SATO
6.59
1.93
921
Sort8.32
2.43
6.51
2.10
10.2
4
847
Sort
8.32
2.43
6.90
2.09
10.1
4
863
Sort+
7.66
2.30
5.92
1.86
10.7
4
829
sanmiguel
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
6.16
1.98
2576
FtoB
6.13
2.11
2740
SATO
6.52
2.33
2678
Sort6.16
1.97
5.70
1.82
27.3
48
2363
Sort
6.16
1.97
5.65
1.77
26.8
72
2230
Sort+
6.45
2.13
5.63
1.73
28.8
60
2138
sibenik
N b/shadow ray before sorting
N t/shadow ray before sorting
N b/shadow ray after sorting
N t/shadow ray after sorting
logging time(s)
sorting time(ms)
total rendering time(s)
Simple
14.19
4.49
1169
FtoB
13.81
3.69
1165
SATO
15.53
4.28
1212
Sort14.19
4.50
13.77
4.03
12.7
4
1120
Sort
14.19
4.50
13.66
3.52
13.2
4
1112
Sort+
14.60
4.09
12.52
2.99
14.2
4
1059
Table 3. The results using OBVH.
Sorting time is reasonably fast as it takes less than 100 milliseconds without parallelization for all of the scenes we tested. This can be easily parallelized by using
R threading building blocks, for example. Recording both Li and Ni slightly
Intel
slows the logging phase.
33
Vol. 5, No. 2, 2016
http://jcgt.org
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
6.
Conclusion and Future Work
In this paper we have proposed a model to describe the cost of occlusion tests for
the n-ary BVH. In order to optimizing traversal order, we demonstrated how the new
model outperforms previous techniques in the order of 1.2 times faster.
The probabilities used in the cost model can be estimated with only a small number of rays, and, furthermore, the sorting time is mostly negligible for off-line rendering.
Our model builds upon state-of-the-art techniques, and we have shown how employing the proposed cost model enables optimal sorting of child nodes through a
very simple criterion that greatly reduces the number of traversal steps.
We would like to investigate how our algorithm performs for wider n-ary BVHs
R AVX-512.
using Intel
A.
Minimizing the Occlusion Test Cost for a Single Node
Let’s consider a node with N children, each one associated with a probability or
intersecting a ray Ii , Hi , and visiting cost Ci . Suppose that the cost of the current
′
node Cany defined as in Equation (1) is minimal, and letting Cany
be the cost of the
node with the children l and l + 1 swapped, the new cost is defined as
′
Cany
= Cmb +
N
X
Ii′ Ci′
i=1
where
∀i ∈ [1, N ],
Ii′ , Hi′ , Ci′
=
i−1
Y
(1 − Hj′ ),
j=1
Il+1 , Hl+1 , Cl+1
Il , Hl , Cl
I , H , C ,
i
i
i
if i = l
if i = l + 1
otherwise.
′
Since we assume Cany to be minimal, Cany
− Cany ≥ 0. Thus, we have
Il+1 Cl+1 Hl − Il Cl Hl+1 ≥ 0.
If 0 < Il Il+1 , this leads to
Hl
Hl+1
≥
.
Il C l
Il+1 Cl+1
i
Therefore, if Cany is minimal, the sequence { IH
} is decreasing. Since Cany can
i Ci
only take a finite number of values over the permutations of (Ii , Hi , Ci ), we deduce
i
that the minimal value is obtained when the sequence { IH
} is ordered in decreasing
i Ci
order.
34
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
Acknowledgements
This research is partially supported by the Japan Science and Technology Agency, CREST
project. We would like to thank Ken Anjyo, Gengdai Liu, and Richard Roberts for proofreading.
References
Á FRA , A. T. 2013. Faster incoherent ray traversal using 8-wide AVX instructions. Tech.
rep., Babeş-Bolyai University, Cluj-Napoca, Romania, Aug. URL: http://www.cs.
ubbcluj.ro/˜afra/publications/afra2013tr_mbvh8.pdf. 23
B ITTNER , J., AND H AVRAN , V. 2009. RDH: Ray Distribution Heuristics for Construction of
Spatial Data Structures. In 25th Spring Conference on Computer Graphics (SCCG 2009),
ACM, New York, H. Hauser, Ed., 61–67. URL: http://dx.doi.org/10.1145/
1980462.1980475. 23
F ELTMAN , N., L EE , M., AND FATAHALIAN , K. 2012. SRDH: Specializing BVH Construction and Traversal Order Using Representative Shadow Ray Sets. In Eurographics/ ACM
SIGGRAPH Symposium on High Performance Graphics, The Eurographics Association,
Aire-la-Ville, Switzerland, C. Dachsbacher, J. Munkberg, and J. Pantaleoni, Eds. 23, 24
G U , Y., H E , Y., AND B LELLOCH , G. E. 2015. Ray Specialized Contraction on Bounding
Volume Hierarchies. Computer Graphics Forum 34, 309–318. URL: http://dx.doi.
org/10.1111/cgf.12769. 24, 30
I ZE , T., AND H ANSEN , C. 2011. RTSAH Traversal Order for Occlusion Rays. In Computer
Graphics Forum (Proceedings of Eurographics 2011), Eurographics Association, Aire-laVille, Switzerland, vol. 30, 297–305. 23, 24
NAH , J.-H., AND M ANOCHA , D. 2014. SATO: Surface Area Traversal Order for Shadow
Ray Tracing. Computer Graphics Forum 33, 6, 167–177. URL: https://diglib.
eg.org/handle/10.1111/v33i6pp167-177. 23, 24
S TICH , M., F RIEDRICH , H., AND D IETRICH , A. 2009. Spatial splits in bounding volume
hierarchies. In Proc. High-Performance Graphics 2009, ACM, New York, 7–13. URL:
http://doi.acm.org/10.1145/1572769.1572771. 23
V INKLER , M., H AVRAN , V., AND S OCHOR , J. 2012. Technical section: Visibility driven
bvh build up algorithm for ray tracing. Comput. Graph. 36, 4 (June), 283–296. URL:
http://dx.doi.org/10.1016/j.cag.2012.02.013. 24
WALD , I., B ENTHIN , C., AND B OULOS , S. 2008. Getting rid of packets - efficient simd
single-ray traversal using multi-branching bvhs. In IEEE Symposium on Interactive Ray
Tracing, 2008. RT 2008, IEEE, Los Alamitos, CA, 49–57. URL: http://dx.doi.
org/10.1109/RT.2008.4634620. 23, 30
WALD , I., W OOP, S., B ENTHIN , C., J OHNSON , G. S., AND E RNST, M. 2014. Embree:
A kernel framework for efficient cpu ray tracing. ACM Trans. Graph. 33, 4 (July), 143:1–
143:8. URL: http://doi.acm.org/10.1145/2601097.2601199. 22, 23, 24
35
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
Index of Supplemental Materials
Our system is implemented in PBRT-v3 thus our project (see http://www.jcgt.org/
published/0005/02/02/code.zip) has the same folder structure. Our QBVH/OBVH
and sorting technique implementations are given in
• src/accelerators/qbvh.h
• src/accelerators/qbvh.cpp
• src/accelerators/obvh.h
• src/accelerators/obvh.cpp
and the shadow ray traversal routines for QBVH/OBVH are implemented in
• src/accelerators/qbvh shadow functions.h
• src/accelerators/qbvh shadow functions.cpp
• src/accelerators/obvh shadow functions.h
• src/accelerators/obvh shadow functions.cpp
In order to use them, describe Accelerator "qbvh" or Accelerator "obvh"
in a .pbrt scene file.
Our system can be compiled with the cmake tool. Several options are defined to give
access to the different cost models. These options are:
• BUILD_SATO which can be used to use the SATO model. It corresponds to the SATO
algorithm in the results.
• BUILD_FRONTTOBACK to build the front-to-back shadow traversal order. It corresponds to the FtoB algorithm in the results.
• BUILD_OPTIMIZE to build with the cost model proposed in this paper. This option
can take sub-options:
– BUILD_NOBBLOG to build without the bounding box logging. It corresponds to
the Sort- in the results.
– BUILD_SHUFFLING to build with the shuffling. It corresponds to the Sort+ in
the results.
Without any sub-options, this option corresponds to the Sort algorithm in the results.
Author Contact Information
Shinji Ogaki
OLM Digital, Inc.
Mikami Bldg. 2F
1-18-10 Wakabayashi, Setagaya-ku
Tokyo, Japan 154-0023
[email protected]
Alexandre Derouet-Jourdan
OLM Digital, Inc.
Mikami Bldg. 2F
1-18-10 Wakabayashi, Setagaya-ku
Tokyo, Japan 154-0023
[email protected]
36
Journal of Computer Graphics Techniques
An N-ary BVH Child Node Sorting Technique for Occlusion Tests
Vol. 5, No. 2, 2016
http://jcgt.org
Shinji Ogaki, Alexandre Derouet-Jourdan, An N-ary BVH Child Node Sorting Technique for
Occlusion Tests, Journal of Computer Graphics Techniques (JCGT), vol. 5, no. 2, 22–37,
2016
http://jcgt.org/published/0005/02/02/
Received:
Recommended:
Published:
2015-10-20
2016-03-15
2016-06-08
Corresponding Editor: Larry Gritz
Editor-in-Chief:
Marc Olano
c 2016 Shinji Ogaki, Alexandre Derouet-Jourdan (the Authors).
The Authors provide this document (the Work) under the Creative Commons CC BY-ND
3.0 license available online at http://creativecommons.org/licenses/by-nd/3.0/. The Authors
further grant permission for reuse of images and text from the first page of the Work, provided
that the reuse is for the purpose of promoting and/or summarizing the Work in scholarly
venues and that any reuse is accompanied by a scientific citation to the Work.
37
© Copyright 2026 Paperzz