Hierarchical Penumbra Casting

Hierarchical Penumbra Casting
Samuli Laine
Timo Aila
Helsinki University of Technology
Hybrid Graphics, Ltd.
Introduction
• Rendering soft shadows
– As usual, area light sources are sampled with
a number of light samples
– Multiple receiver points to be shaded
• The main problem is solving the visibility
– which light samples are visible to which
receiver points
What’s Happening?
Light samples
Light source
Shadow caster
Visible surface
Receiver points
On the Scale of the Problem
• With R receiver points and L light samples
there are RL visibility relations to solve
– For example, 1024×768 resolution and 256
light samples gives over 200 million relations
• Ray casting is the usual solution for
solving the visibility relations
– With T triangles, the cost of casting one
shadow ray is O(log T)
– Total cost becomes O(RL log T)
About Ray Casting
• The standard ray casting approach
considers only one ray at a time
– This inevitably leads to linear performance
with respect to RL
• However, this is highly flexible
– We need to generate only one ray at a time
• Sub-linear complexity with respect to T is
achieved by placing the triangles into an
acceleration hierarchy
Transposing the Algorithm
• Goal: sub-linear complexity w.r.t. RL
• Requires rearranging the rendering loop
Ray Tracing
for each receiver point r
linear to R
for each light sample l
linear to L
find triangle that blocks ray l  r sub-linear to T
Our Approach
for each triangle T
find all l  r rays blocked by T
linear to T
sub-linear to RL
About Our Algorithm
• Sub-linear complexity with respect to RL is
achieved by placing the receiver points
and light samples into acceleration
hierarchies
– Therefore, all receiver points must be
gathered before computing the shadows
• We process one triangle at a time
– Good: no need for triangle BSP
– Bad: linear complexity with respect to T
About Our Algorithm, part 2
• The full rendering process goes as follows:
1. Ray-trace or rasterize the image without
shadows to get the receiver points
2. Build the acceleration structures for receiver
points and light samples
3. Process all triangles to solve the visibility
relations between light samples and receiver
points
4. Perform shading
The Acceleration Structures
• Fixed three-level bounding volume
hierarchy is used for the light samples
– Assuming a polygonal light source, bounding
“volumes” are actually bounding polygons
• Standard bounding volume hierarchy is
used for the receiver points
– Axis-aligned boxes as bounding volumes
Light Sample Hierarchy
• Three levels
• All nodes have a
bounding volume
Root node
Middle nodes
Leaf nodes
Entire light source
Light sample groups
Light samples
Storing the visibility information
• A bit mask with L bits is assigned for every
receiver point
– bit = 0: light sample is visible
– bit = 1: light sample is occluded
• Initially, all bits are zero
• When a triangle is found to occlude a light
sample from a receiver point, the
corresponding bit is set to one
Penumbra Volumes
• All points where a triangle may block a ray
from a bounding volume are inside the
corresponding penumbra volume
Triangle
Bounding
volume in
light hierarchy
Penumbra
volume
Processing a Triangle
• First build penumbra volumes for all nodes
in the light sample hierarchy
• For individual light samples (leaf nodes)
these become hard shadow volumes
Processing a Triangle
• Traverse down the receiver point hierarchy
• Step 1: Test intersection between main
penumbra volume and bounding volume of
receiver node
Triangle
Bounding
volume of
entire light
source
Receiver
node
Main
penumbra
volume
Processing a Triangle
• Step 2: Update the list of active light
sample groups
– At beginning of traversal, all groups are active
Triangle
Bounding
volumes of
light sample
groups
Remove from
active group list
Receiver
node
Processing a Triangle
• Step 3: Recurse into child nodes in
receiver hierarchy
– With pruned list of active light sample groups
Triangle
Bounding
volume of
entire light
source
Child nodes
Main
penumbra
volume
Processing a Triangle
• Step 4: In leaf node, test receiver points
vs. hard shadow volumes of light samples
– Update the visibility relation bits
Triangle
Light samples in
active groups
Receiver
points
Summary of Recursion
• Traverse down the receiver point hierarchy
– Maintain list of active light sample groups
• Initially all groups are active
– First ensure that receiver node intersects the
main penumbra volume, terminate otherwise
– Then prune the active light sample group list
by intersecting receiver node vs. penumbra
volumes of active light sample groups
– In leaf node, test receiver points against hard
shadow volumes of remaining light samples
Optimizations
• Umbra bits for early traversal termination
– With receiver hierarchy rebuilding to ensure
balance
• Active plane sets
• Lazy penumbra volume and hard shadow
volume construction
• On-demand bit mask allocation
• Coarse blocker sorting
Extensions
• Multiple light sample sets
– To remove banding artifacts
• Alpha matte textures
– Often used in e.g. vegetation textures
• Adaptive antialiasing
• Volumetric light sources
Results
• Compared against Mental Ray 3.2
• Benchmarked only the solving of the
visibility relations
– For Mental Ray, computed both with and
without shadows and took the difference
• More detailed results in the paper
Grids
32K triangles
256 light samples
Resolution
1280×9600
2560×1920
Peak mem usage
058M
228M
Speedup factor
13.5
16.7
Flowers
903K triangles
256 light samples
Resolution
1280×9600
2560×1920
Peak mem usage
039M
154M
Speedup factor
3.5
7.8
Sponza
1.27M triangles
256 light samples
Resolution
1280×9600
2560×1920
Peak mem usage
062M
244M
Speedup factor
8.2
11.4
Results: Analysis
• Sub-linearity with respect to R
– Increasing output resolution gives better
relative performance
– Due to hierarchical processing of receiver
points
• Sub-linearity with respect to L
– Using more light samples gives better relative
performance (results in the paper)
– Due to using analytic penumbra volumes that
represent many light samples at once
Results: More Analysis
• Somewhat high memory usage
– Depends on the output resolution
– Depends on the complexity of the shadows
– Does not depend on the number of triangles
in the scene
• New problem: dependence on the spatial
size of light source
– Penumbra volumes become larger
– Leads to lower performance
Conclusions
• Nice properties
+ Exactly the same result as with ray casting
+ No need to store all triangles at any point
+ Sub-linear dependence on output resolution
and number of light samples
• Not so nice properties
– Linear dependence on triangle count
– Memory usage can be high
– Dependence on the spatial size of light source
Future Work
• Process multiple triangles at a time?
• Could experiment with full light sample
hierarchy, which should (in theory) have
better performance
Thank You
• Questions
Funding: National Technology Agency of Finland, Bitboys,
Hybrid Graphics, Remedy Entertainment, Nokia, ATI