A Hierarchical Shadow Volume Algorithm Timo Aila1,2 Tomas Akenine-Möller3 1Helsinki University of Technology 2Hybrid Graphics 3Lund University 1 Outline Brief intro to shadow volumes Our solution fillrate problem, existing solutions idea implementation Results Q&A 2 Shadow volumes [Crow77] Shadow volumes define closed volumes of space that are in shadow infinitesimal light source shadow caster = light cap dark cap extruded side quads 3 Is point inside shadow volume? Pick reference point R outside shadow volume 1. 2. 3. any such point is OK Span line from R to point to be classified Compute sum of enter (+1) and exit (-1) events P1 shadow volume 2D illustration: P2 R P3 4 Using graphics hardware R at ∞ behind pixel (z-fail) [Bilodeau&Songy, Carmack] infinity always outside SVs – robust must not clip to far plane of view frustum sum hidden events to stencil buffer, sign from backface culling visible samples (or pixels) 2D illustration: + - camera + view frustum shadow volume R - + 5 Amount of pixel processing Adapted from [Chan and Durand 2004] 6 Fillrate problem 50+ fps without shadows on ATI Radeon 9800XT at 1280x1024, 1 sample/pixel 1 fps when shadow volumes rasterized 2.2 billion pixels per frame 7 Existing solutions (1/2) CC shadow volumes [Lloyd et al. 2004] draw SVs only where receivers exist good when lots of empty space Hybrid shadow maps and volumes [Chan&Durand 2004] use SVs only at shadow boundaries boundary pixels determined using shadow map artifacts due to limited shadow map resolution 8 Existing solutions (2/2) Depth bounds [Nvidia 2003] application supplies min & max depth values separately for each shadow volume rasterize shadow volume only when visible geometry between [min,max] min max optimal bounds hard to compute 2D illustration: camera shadow volume 9 visible pixels Outline Brief intro to shadow volumes Our solution fillrate problem, existing solutions idea implementation Results Q&A 10 Reference image 11 Shadow volume algorithm executed once per 8x8 pixel tile 12 Green tiles may contain shadow boundary - other tiles were correct 13 Low-res (gray) + per-pixel computed boundaries (dark) 14 How to detect shadow boundaries? Two facts about shadow volumes 1. 2. always closed SV triangles mark potential shadow boundaries If 3D volume in scene not intersected by shadow volume triangles fully lit or fully in shadow single sample classifies entire volume 15 Outline Brief intro to shadow volumes Our solution fillrate problem, existing solutions idea implementation Results Q&A 16 Detecting boundary tiles Bound tile with axis-aligned bounding box 8x8 pixel region Zmin, Zmax Zma x 8 Zmin 8 pixels Triangle vs. AA Box intersection test 1. 2. low-resolution rasterization Zmin and Zmax tests 17 Fast update of non-boundary tiles Copy low-res shadows to stencil buffer writing 64 per-pixel values would be slow Two-level stencil buffer saves the day maintain [Smin, Smax] per tile always test the higher level first often no need to validate per-pixel values stencil values of non-boundary tiles are constant 18 Implementation – Stage 1 SV triangles Low-res shadows Boundary? Low-resolution rasterizer Per-tile operations Buffers built separately for each shadow volume Classifications ready when entire SV processed application marks begin/end of shadow volumes 19 Implementation – Stage 2 Low-res shadows Boundary? SV triangles Low-resolution rasterizer boundary tile? No Copy to 2-level stencil Yes Per-pixel rasterizer Stencil ops Update 2-level stencil 20 Alternative implementations Two pass Pass 1 = Stage 1 Pass 2 = Stage 2 How to keep pixel units busy during Stage 1? maybe assign per-tile operations to pixel shaders? Single pass Separate stages using delay stream [Aila et al. 2003] Stage 2 of current SV executes simultaneously with next SV’s Stage 1 21 Hardware resources Two-level stencil buffer Per-tile operations Optionally delay stream * duplicate low-res rasterizer & Zmin/Zmax units * cache for per shadow volume buffers multiple buffers for pipelined operation allocate from external memory * If not already there for occlusion culling purposes 22 Outline Brief intro to shadow volumes Our solution fillrate problem, existing solutions idea implementation Results Q&A 23 Results – Simple scene (1280x1024) Depth bounds Hierarchical Improvement Ratio in #pixels 1.1 12.7 11.5 Ratio in bandwidth 1.03 17.6 17.2 24 Results – Knights (1280x1024) Depth bounds Hierarchical Improvement Ratio in #pixels 2.6 7.4 2.8 Ratio in bandwidth 2.4 5.6 2.4 25 Results – Powerplant (1280x1024) Depth bounds Hierarchical Improvement Ratio in #pixels 2.4 22.9 9.5 Ratio in bandwidth 2.3 16.0 6.9 26 Summary Hierarchical rendering method for shadow volumes significant fillrate savings compared to other hardware methods also works for soft shadow volumes Future work would it make sense to extend programmability to per-tile operations? how many pipeline bubbles are created? requires chip-level simulations 27 Thank you! Questions? Acknowledgements Ville Miettinen, Jacob Ström, Eric Haines, Ulf Assarsson, Lauri Savioja, Jonas Svensson, Ulf Borgenstam, Karl Schultz, 3DR group at Helsinki University of Technology The National Technology Agency of Finland, Hybrid Graphics, Bitboys, Nokia and Remedy Entertainment ATI for granting fellowship to Timo (2004-2005) 28
© Copyright 2026 Paperzz