Slides

A Hierarchical Shadow Volume Algorithm
Timo Aila1,2
Tomas Akenine-Möller3
1Helsinki
University of Technology
2Hybrid
Graphics
3Lund
University
1
Outline

Brief intro to shadow volumes


Our solution




fillrate problem, existing solutions
idea
implementation
Results
Q&A
2
Shadow volumes

[Crow77]
Shadow volumes define closed
volumes of space that are in shadow
infinitesimal
light source
shadow caster
= light cap
dark cap
extruded
side quads
3
Is point inside shadow volume?
Pick reference point R outside shadow volume
1.

2.
3.
any such point is OK
Span line from R to point to be classified
Compute sum of enter (+1) and exit (-1) events
P1
shadow volume
2D illustration:
P2
R
P3
4
Using graphics hardware

R at ∞ behind pixel (z-fail) [Bilodeau&Songy, Carmack]



infinity always outside SVs – robust
must not clip to far plane of view frustum
sum hidden events to stencil buffer,
sign from backface culling
visible samples (or pixels)
2D illustration:
+
-
camera
+
view frustum
shadow volume
R
-
+
5
Amount of pixel processing
Adapted from [Chan and Durand 2004]
6
Fillrate problem

50+ fps without shadows
on ATI Radeon 9800XT at
1280x1024, 1 sample/pixel

1 fps when shadow
volumes rasterized

2.2 billion pixels per frame
7
Existing solutions (1/2)

CC shadow volumes [Lloyd et al. 2004]



draw SVs only where receivers exist
good when lots of empty space
Hybrid shadow maps and volumes [Chan&Durand 2004]



use SVs only at shadow boundaries
boundary pixels determined using shadow map
artifacts due to limited shadow map resolution
8
Existing solutions (2/2)

Depth bounds [Nvidia 2003]



application supplies min & max depth values
separately for each shadow volume
rasterize shadow volume only when visible
geometry between [min,max]
min
max
optimal bounds hard to compute
2D illustration:
camera
shadow volume
9
visible pixels
Outline

Brief intro to shadow volumes


Our solution




fillrate problem, existing solutions
idea
implementation
Results
Q&A
10
Reference image
11
Shadow volume algorithm
executed once per 8x8 pixel tile
12
Green tiles may contain shadow
boundary - other tiles were correct
13
Low-res (gray) + per-pixel
computed boundaries (dark)
14
How to detect shadow boundaries?

Two facts about shadow volumes
1.
2.

always closed
SV triangles mark potential shadow boundaries
If 3D volume in scene not intersected by
shadow volume triangles


fully lit or fully in shadow
single sample classifies entire volume
15
Outline

Brief intro to shadow volumes


Our solution




fillrate problem, existing solutions
idea
implementation
Results
Q&A
16
Detecting boundary tiles

Bound tile with axis-aligned bounding box


8x8 pixel region
Zmin, Zmax
Zma
x
8
Zmin
8 pixels

Triangle vs. AA Box intersection test
1.
2.
low-resolution rasterization
Zmin and Zmax tests
17
Fast update of non-boundary tiles

Copy low-res shadows to stencil buffer


writing 64 per-pixel values would be slow
Two-level stencil buffer saves the day




maintain [Smin, Smax] per tile
always test the higher level first
often no need to validate per-pixel values
stencil values of non-boundary tiles are constant
18
Implementation – Stage 1
SV triangles
Low-res shadows
Boundary?
Low-resolution rasterizer
Per-tile operations


Buffers built separately for each shadow volume
Classifications ready when entire SV processed

application marks begin/end of shadow volumes
19
Implementation – Stage 2
Low-res shadows
Boundary?
SV triangles
Low-resolution rasterizer
boundary
tile?
No
Copy to
2-level stencil
Yes
Per-pixel rasterizer
Stencil ops
Update 2-level stencil
20
Alternative implementations

Two pass



Pass 1 = Stage 1
Pass 2 = Stage 2
How to keep pixel units busy during Stage 1?


maybe assign per-tile operations to pixel shaders?
Single pass


Separate stages using delay stream [Aila et al. 2003]
Stage 2 of current SV executes simultaneously
with next SV’s Stage 1
21
Hardware resources

Two-level stencil buffer
Per-tile operations

Optionally




delay stream *
duplicate low-res rasterizer & Zmin/Zmax units *
cache for per shadow volume buffers


multiple buffers for pipelined operation
allocate from external memory
* If not already there for occlusion culling purposes
22
Outline

Brief intro to shadow volumes


Our solution




fillrate problem, existing solutions
idea
implementation
Results
Q&A
23
Results – Simple scene (1280x1024)
Depth bounds
Hierarchical
Improvement
Ratio in #pixels
1.1
12.7
11.5
Ratio in bandwidth
1.03
17.6
17.2
24
Results – Knights (1280x1024)
Depth bounds
Hierarchical
Improvement
Ratio in #pixels
2.6
7.4
2.8
Ratio in bandwidth
2.4
5.6
2.4
25
Results – Powerplant (1280x1024)
Depth bounds
Hierarchical
Improvement
Ratio in #pixels
2.4
22.9
9.5
Ratio in bandwidth
2.3
16.0
6.9
26
Summary

Hierarchical rendering method for shadow
volumes



significant fillrate savings compared to other
hardware methods
also works for soft shadow volumes
Future work


would it make sense to extend programmability to
per-tile operations?
how many pipeline bubbles are created?

requires chip-level simulations
27
Thank you!

Questions?

Acknowledgements



Ville Miettinen, Jacob Ström, Eric Haines, Ulf Assarsson,
Lauri Savioja, Jonas Svensson, Ulf Borgenstam, Karl
Schultz, 3DR group at Helsinki University of Technology
The National Technology Agency of Finland, Hybrid
Graphics, Bitboys, Nokia and Remedy Entertainment
ATI for granting fellowship to Timo (2004-2005)
28