Vector Units and Quaternions Jim Van Verth Red Storm Entertainment [email protected] make better games About This Talk • Will discuss how to do quaternion math on PS2 • Assume that you already know and want to use quaternions • Assume that you already know something about how the VU works make better games About Me • Lead engineer at Red Storm Entertainment • Not a quaternion god • Not a vector unit god • Not really familiar with VCL • Just a 3D guy trying to get by… make better games About the code • Most examples written in macro mode (VU0) • Easy to translate to micro mode • Examples that would be faster in micro mode are discussed separately make better games Matrices on PS2 • PS2 is really well set up to do matrices • Multiplies are highly parallel • Not so good for quaternions make better games Matrix Multiply vmulax vmadday vmaddaz vmaddw ACC, vf2, vf1x ACC, vf3, vf1y ACC, vf4, vf1z vf6, vf5, vf1w • This is what we’re up against • Takes 4/7 cycles to transform a point • Takes 16/19 cycles to concat matrices (9/12 cycles for 3x3 matrix) make better games Why Quaternions? • Quaternions take up less space: 4 floats vs. 9 (best case) • Quaternions interpolate well • Avoid floating point drift (normalize vs. Gram-Schmidt orthogonalization) make better games Quaternions on VU • • • • Fit very well Four floats, aligned to 16-bit boundary Work just like homogeneous point Make sure stored (x,y,z,w) not (w,x,y,z) make better games Quaternion Multiplication • If quaternion is (x, y, z,w) or (v, w) then q1 q2 (w1 v 2 w2 v1 v1 v 2 , w1 w2 v1 v 2 ) • All standard vector operations > Add, scale, dot product, cross product make better games Quaternion Mult on PS2 vmul vopmula.xyz vmaddaw.xyz vmaddaw.xyz vopmsub.xyz vsubaz.w vmsubax.w vmsuby.w vf3, vf1, vf2 acc, vf1, vf2 acc, vf2, vf1w acc, vf1, vf2w vf3, vf2, vf1 acc, vf3, vf3z acc, vf0, vf3x vf3, vf0, vf3y w= w1·w2 v1 • v2 v = w1·v2 + w2·v1 + v1 v2 • Interleaves dot product and rest via accumulator • Takes advantage of linearity of cross product • Cycle count: 8/11 • Less than matrix! make better games Vector Rotation • Formula for vector rotation: 1 p q p q pw 0 • Two mults takes 16 cycles, plus the inverse • Can do better make better games Vector Rotation, Take Two • If q is normalized, then can do: 2 p ( v p) v w p 2w ( v p) v ( v p) • This is faster than two straight multiplies on serial processor • Faster on vector processor, too! make better games Vector Rotation on VU vmul vf11, vf1, vf2 vopmula.xyz acc, vf2, vf1 vopmsub.xyz vf5, vf1, vf2 vmul.w vf6w, vf2w, vf2w vadd.w vf7w, vf2w, vf2w vmulax.w accw, vf0w, vf11x vmadday.w accw, vf0w, vf11y vmaddz.w vf11w, vf0w, vf11z vopmula.xyz acc, vf2, vf5 vmaddaw.xyz acc, vf5, vf7w vmaddaw.xyz acc, vf1, vf6w vmaddaw.xyz acc, vf2, vf11w vopmsub.xyz vf3, vf5, vf2 • p in vf1, q in vf2 p = (vp)·v + w2·p + 2w·(v p) + v (v p) make better games Vector Rotation on VU vmul vf11, vf1, vf2 vopmula.xyz acc, vf2, vf1 vopmsub.xyz vf5, vf1, vf2 vmul.w vf6w, vf2w, vf2w vadd.w vf7w, vf2w, vf2w vmulax.w accw, vf0w, vf11x vmadday.w accw, vf0w, vf11y vmaddz.w vf11w, vf0w, vf11z vopmula.xyz acc, vf2, vf5 vmaddaw.xyz acc, vf5, vf7w vmaddaw.xyz acc, vf1, vf6w vmaddaw.xyz acc, vf2, vf11w vopmsub.xyz vf3, vf5, vf2 • First part builds all the pieces • Second part adds ‘em all together • Cycles: 13/16 • Better than straight multiply • Worse than matrix make better games Full Transforms • Combination of translation vector t, quat r, 3 scale factors s • Once again, want to transform point • Basic formula: 1 p t r (s p) r make better games Point Transformation vmul vf1, vf1, vf3 vmul vf11, vf1, vf2 vopmula.xyz acc, vf2, vf1 vopmsub.xyz vf5, vf1, vf2 vmul.w vf6w, vf2w, vf2w vadd.w vf7w, vf2w, vf2w vmulax.w accw, vf0w, vf11x vmadday.w accw, vf0w, vf11y vmaddz.w vf11w, vf0w, vf11z vopmula.xyz acc, vf2, vf5 vmaddaw.xyz acc, vf5, vf7w vmaddaw.xyz acc, vf1, vf6w vmaddaw.xyz acc, vf2, vf11w vmaddaw.xyz acc, vf4, vf0w vopmsub.xyz vf3, vf5, vf2 • • • • p in vf1, q in vf2 scale in vf3 translation in vf4 Takes four extra cycles for scale (including stalls), one extra for xlate • Cycle count: 18/21 make better games Transform Concatenation • Look at formula: s s1 s2 r r2 r1 t t 2 r2 ( s2t 1 ) r2 1 • Have to transform point and multiply two quaternions and multiply scales make better games Transform Concatenation • Takes 8 cycles for quat multiply, 18 for transform, 1 for scale • Have three stall cycles available • Bottom line: 24/27 cycles • Much slower than matrix multiplication • Not recommended make better games Matrix Conversion • Quat-vector transformation not as efficient as matrix-vector transformation (13 cycles vs. 4) • To do multiple points, want to convert quaternion to a 4x4 matrix make better games Matrix Conversion • Corresponding 4x4 matrix to normalized quat q = (x,y,z,w) is: 1 2 y 2 2 z 2 2 xy 2 wz 2 xz 2 wy 2 yz 2 wx 2 xy 2 wz 1 2 x 2 2 z 2 Mq 2 2 2 xz 2 wy 2 yz 2 wx 1 2 x 2 y 0 0 0 0 0 0 1 • Not obvious how to do this efficiently make better games Matrix Conversion • Two approaches • One works well in macro mode • One in micro mode > uses Lower instructions to achieve better parallelism make better games Matrix Conversion (macro) • Idea: matrix is built from two other matrices q ( x, y, z , w) z y x w z y x w x y z w x y z w Mq y x w z y x w z x y z w x y z w make better games Matrix Conversion (macro) • Simplification: matrix multiply is series of row vector multiplies • Create right matrix, generate left matrix via accumulator tricks z y x w x y z w Rq y x w z x y z w make better games Matrix Conversion (macro) • Look at one row in matrix multiply: vmulax ACC, vf5, vf1x vmadday ACC, vf6, vf1y vmaddaz ACC, vf7, vf1z vmaddw vf9, vf8, vf1w • Or could just do: vmulaw ACC, vf8, vf1w vmadday ACC, vf6, vf1y vmaddaz ACC, vf7, vf1z vmaddx vf9, vf5, vf1x • Is linear, so order doesn’t matter make better games Matrix Conversion (macro) • Idea: all values we need for left matrix are in quaternion • Load accumulator with mula by w value (always positive) • vmadd or vmsub to multiply by positive or negative value and accumulate vf 5 x vmulaw.xyz acc, vf2, vf5w vmaddax.xyz acc, vf3, vf5x vmadday.xyz acc, vf4, vf5y vmsubz.xyz vf13, vf1, vf5z make better games y vf 13 z z w w x vf 1 vf 2 y vf 3 vf 4 Matrix Conversion (macro) • More simplification: > Last row of Mq always (0,0,0,1), don’t compute! > Last column always 0 too, don’t compute! > Last row of Rq just the quat in VU format • Just build: z y w x z w Rq y x w x y z make better games ~ ~ ~ ~ Matrix Conversion (macro) vaddw.x vf1, vf0, vf4 vaddz.y vf1, vf0, vf4 vsuby.z vf1, vf0, vf4 vsubz.x vf2, vf0, vf4 vaddw.y vf2, vf0, vf4 vaddx.z vf2, vf0, vf4 vaddy.x vf3, vf0, vf4 vsubx.y vf3, vf0, vf4 vaddw.z vf3, vf0, vf4 vmr32.w vf12, vf0 vmr32.w vf13, vf0 vmr32.w vf14, vf0 • Stage one: > Load quat in vf4 > Build right matrix > Clear right column of result vf1=(w,z,-y,~) vf2=(-z,w,x,~) vf3=(y,-x,w,~) vf4=(x,y,z,w) make better games Matrix Conversion (macro) vmulaw.xyz acc, vf1, vf4w vmaddaz.xyz acc, vf2, vf4z vmsubay.xyz acc, vf3, vf4y vmaddx.xyz vf12, vf4, vf4x vmulaw.xyz acc, vf2, vf4w vmaddax.xyz acc, vf3, vf4x vmadday.xyz acc, vf4, vf4y vmsubz.xyz vf13, vf1, vf4z vmulaw.xyz acc, vf3, vf4w vmaddaz.xyz acc, vf4, vf4z vmadday.xyz acc, vf1, vf4y vmsubx.xyz vf14, vf2, vf4x vmove.xyzw vf15, vf0 • Stage two: > Matrix multiply to get first three rows > Clear bottom row • Note: accumulate only on xyz (w already cleared) • Cycles: 25/28 make better games Matrix Conversion (micro) • Lots of duplicate calculations in matrix 1 2 y 2 2 z 2 2 xy 2 wz 2 xz 2 wy 2 yz 2 wx 2 xy 2 wz 1 2 x 2 2 z 2 Mq 2 2 2 xz 2 wy 2 yz 2 wx 1 2 x 2 y 0 0 0 0 0 0 1 • Idea: calculate only what we need, use shifting and accumulator tricks to parallelize efficiently • Devised by Colin Hughes of SCEE make better games Matrix Conversion (micro) mula acc, vf1, vf1 muli vf3, vf1, I madd vf2, vf1, vf1 addw vf4, vf0, vf0w opmula acc, vf3, vf3 msubw vf5, vf3, vf3w maddw vf6, vf3, vf3w addaw.xyz acc, vf0, vf0w msubax.yz acc, vf4, vf2x msuby.z vf26, vf4, vf2y msubay.xz acc, vf4, vf2y msubz.y vf25, vf4, vf2z msubz.x vf24, vf4, vf2z addy.z vf24, vf0, vf6y addx.y vf26, vf0, vf6x loi SQRT_2 mr32.w vf24, vf0 nop nop move vf27, vf0 mr32.w vf26, vf0 mr32.w vf25, vf0 nop nop mr32 vf3, vf5 mr32 vf7, vf6 mr32.y vf24, vf5 mr32.x vf26, vf5 mr32.z vf25, vf3 mr32.x vf25, vf7 make better games • Three parts • Calculate elements • Clear matrix • Shift, add and copy into place • 16/19 cycles Matrix Conversion • If you’re converting a quaternion and going to use it immediately, can make some assumptions • Don’t create bottom row (just use vf0) • Don’t clear right column (just use xyz) • Saves four cycles in macro mode case make better games Transform to Matrix • Use one of the quaternion matrix techniques • Scale first three rows by each scale factor • Replace last row with translation • Results: > 29/32 for macro mode > 20/23 for micro mode make better games Normalization • Need to normalize quaternion to keep it useful for rotation > (Also avoids floating point drift) • Fortunately PS2 has reciprocal square root instruction • Unfortunately it takes a while make better games Normalization vmul vaddaz.w vmaddax.w vmaddy.w vrsqrt vwaitq vmulq vf2, vf1, vf1 acc, vf2, vf2 acc, vf0, vf2 vf2, vf0, vf2 Q, vf0w, vf2w vf1, vf1, Q • Compute dot product • Compute 1/length • Scale quaternion • With stalls, takes 24/27 cycles make better games Normalization • Another approach > From “The Inner Product”, March 2002 Game Developer by Jonathan Blow > Approximate 1/x via Newton-Raphson iteration > First iteration takes (looks like) 4/7 cycles on VU0 > Second iteration takes as long as RSQRT > Recommend: if x > 0.91521198, use approx > Otherwise use RSQRT make better games Interpolation • This is where it’s at • It would be great if it was fast • Um, well… make better games Interpolation • First look at spherical linear interp sin( θ(1 t )) q sin( θt ) r slerp( q, r, t ) sin θ • That’s a lot of sines • Could precompute , 1/sin • But at least 28 cycles for one of the other sines • We (RSE) don’t use slerp anyway make better games Interpolation • Lerp, then lerp( q, r, t ) (1 t ) q t r q t q t r is simply (q in vf1, r in vf2, t in vf3w) vaddax acc, vf1, vf0x vmsubaw acc, vf1, vf3w vmaddw vf1, vf2,vf3w • Need to normalize afterwards • Makes 30/33 cycles make better games Interpolation • Not quite that simple • Problem: if q•r < 0, interpolation will take long way around sphere • Need to negate one quat • Gives the same orientation, but the interpolation takes the short route make better games Linear Interpolation vmul vf4, vf1, vf2 vaddaz.w acc, vf04, vf4 vmaddax.w acc, vf00, vf4 vmaddy.w vf4, vf00, vf4 vnop vnop vnop cfc2 t0,$16 and t0,t0,0x0002 vaddax acc, vf1, vf0x beq t0,zero,Add vmsubaw acc, vf2,vf3w b Finish Add: vmaddaw acc, vf2,vf3w Finish: vmsubw vf1, vf1, vf3w • Compute dot product • Check for negative • Interpolate • Follow up with normalization • Takes 43/46 cycles make better games Linear Interpolation • • • • • There’s more we can do Jonathan Blow’s article, again Use spline to correct error in lerp More investigation needed Initial results: takes about 24-26 more cycles • Looks faster than slerp, more accurate than lerp make better games How We’re Using All This • • • • A bit research-y at the moment VU0-based math library Optimization in specific routines In particular, concatenation and interpolation for bones animation • More memory savings: store quat as 4.12 fixed-point shorts make better games Conclusions • • • • • Quaternions useful on PS2 Cheaper to concatenate (alone) Convert to matrix to transform Use linear interpolation Check out Jonathan Blow’s article make better games References • Shoemake, Ken, “Animating Rotation with Quaternion Curves,” Computer Graphics, Vol. 19, No. 3 (July 1985). • EE Core Instruction Set Manual • VU User’s Manual • Sony newsgroups • Blow, Jonathan, “Hacking Quaternions,” Game Developer, Vol. 9, No. 3 (March 2002). [get updated source from www.gdmag.com/code.htm] make better games Questions? make better games • Please hand in comment sheets • Slides available at: http://obiwan.redstorm.com/~jimvv make better games Vector Units and Quaternions Jim Van Verth Red Storm Entertainment [email protected] make better games
© Copyright 2026 Paperzz