GDC 02 Template - Essential Math for Games Programmers

Vector Units and Quaternions
Jim Van Verth
Red Storm Entertainment
[email protected]
make
better
games
About This Talk
• Will discuss how to do quaternion
math on PS2
• Assume that you already know and
want to use quaternions
• Assume that you already know
something about how the VU works
make
better
games
About Me
• Lead engineer at Red Storm
Entertainment
• Not a quaternion god
• Not a vector unit god
• Not really familiar with VCL
• Just a 3D guy trying to get by…
make
better
games
About the code
• Most examples written in macro mode
(VU0)
• Easy to translate to micro mode
• Examples that would be faster in
micro mode are discussed separately
make
better
games
Matrices on PS2
• PS2 is really well set up to do
matrices
• Multiplies are highly parallel
• Not so good for quaternions
make
better
games
Matrix Multiply
vmulax
vmadday
vmaddaz
vmaddw
ACC, vf2, vf1x
ACC, vf3, vf1y
ACC, vf4, vf1z
vf6, vf5, vf1w
• This is what we’re
up against
• Takes 4/7 cycles to
transform a point
• Takes 16/19 cycles
to concat matrices
(9/12 cycles for 3x3
matrix)
make
better
games
Why Quaternions?
• Quaternions take up less space: 4
floats vs. 9 (best case)
• Quaternions interpolate well
• Avoid floating point drift (normalize vs.
Gram-Schmidt orthogonalization)
make
better
games
Quaternions on VU
•
•
•
•
Fit very well
Four floats, aligned to 16-bit boundary
Work just like homogeneous point
Make sure stored (x,y,z,w) not (w,x,y,z)
make
better
games
Quaternion Multiplication
• If quaternion is (x, y, z,w) or (v, w) then
q1  q2  (w1  v 2  w2  v1  v1  v 2 , w1  w2  v1  v 2 )
• All standard vector operations
> Add, scale, dot product, cross product
make
better
games
Quaternion Mult on PS2
vmul
vopmula.xyz
vmaddaw.xyz
vmaddaw.xyz
vopmsub.xyz
vsubaz.w
vmsubax.w
vmsuby.w
vf3, vf1, vf2
acc, vf1, vf2
acc, vf2, vf1w
acc, vf1, vf2w
vf3, vf2, vf1
acc, vf3, vf3z
acc, vf0, vf3x
vf3, vf0, vf3y
w= w1·w2  v1 • v2
v = w1·v2 + w2·v1 + v1  v2
• Interleaves dot
product and rest
via accumulator
• Takes advantage of
linearity of cross
product
• Cycle count: 8/11
• Less than matrix!
make
better
games
Vector Rotation
• Formula for vector rotation:
1

p  q p q
pw  0
• Two mults takes 16 cycles, plus the
inverse
• Can do better
make
better
games
Vector Rotation, Take Two
• If q is normalized, then can do:
2

p  ( v  p)  v  w  p  2w  ( v  p)  v  ( v  p)
• This is faster than two straight
multiplies on serial processor
• Faster on vector processor, too!
make
better
games
Vector Rotation on VU
vmul vf11, vf1, vf2
vopmula.xyz acc, vf2, vf1
vopmsub.xyz vf5, vf1, vf2
vmul.w vf6w, vf2w, vf2w
vadd.w vf7w, vf2w, vf2w
vmulax.w accw, vf0w, vf11x
vmadday.w accw, vf0w, vf11y
vmaddz.w vf11w, vf0w, vf11z
vopmula.xyz acc, vf2, vf5
vmaddaw.xyz acc, vf5, vf7w
vmaddaw.xyz acc, vf1, vf6w
vmaddaw.xyz acc, vf2, vf11w
vopmsub.xyz vf3, vf5, vf2
• p in vf1, q in vf2
p = (vp)·v
+ w2·p
+ 2w·(v  p)
+ v  (v  p)
make
better
games
Vector Rotation on VU
vmul vf11, vf1, vf2
vopmula.xyz acc, vf2, vf1
vopmsub.xyz vf5, vf1, vf2
vmul.w vf6w, vf2w, vf2w
vadd.w vf7w, vf2w, vf2w
vmulax.w accw, vf0w, vf11x
vmadday.w accw, vf0w, vf11y
vmaddz.w vf11w, vf0w, vf11z
vopmula.xyz acc, vf2, vf5
vmaddaw.xyz acc, vf5, vf7w
vmaddaw.xyz acc, vf1, vf6w
vmaddaw.xyz acc, vf2, vf11w
vopmsub.xyz vf3, vf5, vf2
• First part builds all
the pieces
• Second part adds
‘em all together
• Cycles: 13/16
• Better than straight
multiply
• Worse than matrix
make
better
games
Full Transforms
• Combination of translation vector t,
quat r, 3 scale factors s
• Once again, want to transform point
• Basic formula:
1

p  t  r  (s  p)  r
make
better
games
Point Transformation
vmul vf1, vf1, vf3
vmul vf11, vf1, vf2
vopmula.xyz acc, vf2, vf1
vopmsub.xyz vf5, vf1, vf2
vmul.w vf6w, vf2w, vf2w
vadd.w vf7w, vf2w, vf2w
vmulax.w accw, vf0w, vf11x
vmadday.w accw, vf0w, vf11y
vmaddz.w vf11w, vf0w, vf11z
vopmula.xyz acc, vf2, vf5
vmaddaw.xyz acc, vf5, vf7w
vmaddaw.xyz acc, vf1, vf6w
vmaddaw.xyz acc, vf2, vf11w
vmaddaw.xyz acc, vf4, vf0w
vopmsub.xyz vf3, vf5, vf2
•
•
•
•
p in vf1, q in vf2
scale in vf3
translation in vf4
Takes four extra
cycles for scale
(including stalls),
one extra for xlate
• Cycle count: 18/21
make
better
games
Transform Concatenation
• Look at formula:
s  s1  s2
r  r2  r1
t  t 2  r2  ( s2t 1 )  r2
1
• Have to transform point and multiply
two quaternions and multiply scales
make
better
games
Transform Concatenation
• Takes 8 cycles for quat multiply, 18 for
transform, 1 for scale
• Have three stall cycles available
• Bottom line: 24/27 cycles
• Much slower than matrix multiplication
• Not recommended
make
better
games
Matrix Conversion
• Quat-vector transformation not as
efficient as matrix-vector
transformation (13 cycles vs. 4)
• To do multiple points, want to convert
quaternion to a 4x4 matrix
make
better
games
Matrix Conversion
• Corresponding 4x4 matrix to
normalized quat q = (x,y,z,w) is:
1  2 y 2  2 z 2
2 xy  2 wz
2 xz  2 wy

2 yz  2 wx
 2 xy  2 wz 1  2 x 2  2 z 2
Mq  
2
2
2
xz

2
wy
2
yz

2
wx
1

2
x

2
y


0
0
0

0

0

0
1 
• Not obvious how to do this efficiently
make
better
games
Matrix Conversion
• Two approaches
• One works well in macro mode
• One in micro mode
> uses Lower instructions to achieve better
parallelism
make
better
games
Matrix Conversion (macro)
• Idea: matrix is built from two other
matrices
q  ( x, y, z , w)
z  y x  w
z  y  x
 w

 

x y   z w
x  y
 z w
Mq  


y x w z
y  x w  z

 

  x  y  z w  x
y
z
w 

 
make
better
games
Matrix Conversion (macro)
• Simplification: matrix multiply is series
of row vector multiplies
• Create right matrix, generate left
matrix via accumulator tricks
z  y  x
w


x  y
 z w
Rq  
y  x w  z


 x

y
z
w


make
better
games
Matrix Conversion (macro)
• Look at one row in matrix multiply:
vmulax ACC, vf5, vf1x
vmadday ACC, vf6, vf1y
vmaddaz ACC, vf7, vf1z
vmaddw vf9, vf8, vf1w
• Or could just do:
vmulaw ACC, vf8, vf1w
vmadday ACC, vf6, vf1y
vmaddaz ACC, vf7, vf1z
vmaddx vf9, vf5, vf1x
• Is linear, so order doesn’t matter
make
better
games
Matrix Conversion (macro)
• Idea: all values we need for left matrix are
in quaternion
• Load accumulator with mula by w value
(always positive)
• vmadd or vmsub to multiply by positive or
negative value and accumulate
vf 5   x
vmulaw.xyz acc, vf2, vf5w
vmaddax.xyz acc, vf3, vf5x
vmadday.xyz acc, vf4, vf5y
vmsubz.xyz vf13, vf1, vf5z
make
better
games
y
vf 13   z
z
w
w x
 vf 1 


 vf 2 
y  
vf 3 


 vf 4 


Matrix Conversion (macro)
• More simplification:
> Last row of Mq always (0,0,0,1), don’t compute!
> Last column always 0 too, don’t compute!
> Last row of Rq just the quat in VU format
• Just build:
z y
w

x
 z w
Rq  
y x w

 x
y
z

make
better
games
~

~
~

~ 
Matrix Conversion (macro)
vaddw.x vf1, vf0, vf4
vaddz.y vf1, vf0, vf4
vsuby.z vf1, vf0, vf4
vsubz.x vf2, vf0, vf4
vaddw.y vf2, vf0, vf4
vaddx.z vf2, vf0, vf4
vaddy.x vf3, vf0, vf4
vsubx.y vf3, vf0, vf4
vaddw.z vf3, vf0, vf4
vmr32.w vf12, vf0
vmr32.w vf13, vf0
vmr32.w vf14, vf0
• Stage one:
> Load quat in vf4
> Build right matrix
> Clear right column of
result
vf1=(w,z,-y,~)
vf2=(-z,w,x,~)
vf3=(y,-x,w,~)
vf4=(x,y,z,w)
make
better
games
Matrix Conversion (macro)
vmulaw.xyz acc, vf1, vf4w
vmaddaz.xyz acc, vf2, vf4z
vmsubay.xyz acc, vf3, vf4y
vmaddx.xyz vf12, vf4, vf4x
vmulaw.xyz acc, vf2, vf4w
vmaddax.xyz acc, vf3, vf4x
vmadday.xyz acc, vf4, vf4y
vmsubz.xyz vf13, vf1, vf4z
vmulaw.xyz acc, vf3, vf4w
vmaddaz.xyz acc, vf4, vf4z
vmadday.xyz acc, vf1, vf4y
vmsubx.xyz vf14, vf2, vf4x
vmove.xyzw vf15, vf0
• Stage two:
> Matrix multiply to get first
three rows
> Clear bottom row
• Note: accumulate only
on xyz (w already
cleared)
• Cycles: 25/28
make
better
games
Matrix Conversion (micro)
• Lots of duplicate calculations in matrix
1  2 y 2  2 z 2
2 xy  2 wz
2 xz  2 wy

2 yz  2 wx
 2 xy  2 wz 1  2 x 2  2 z 2
Mq  
2
2
2
xz

2
wy
2
yz

2
wx
1

2
x

2
y


0
0
0

0

0

0
1 
• Idea: calculate only what we need,
use shifting and accumulator tricks to
parallelize efficiently
• Devised by Colin Hughes of SCEE
make
better
games
Matrix Conversion (micro)
mula acc, vf1, vf1
muli vf3, vf1, I
madd vf2, vf1, vf1
addw vf4, vf0, vf0w
opmula acc, vf3, vf3
msubw vf5, vf3, vf3w
maddw vf6, vf3, vf3w
addaw.xyz acc, vf0, vf0w
msubax.yz acc, vf4, vf2x
msuby.z vf26, vf4, vf2y
msubay.xz acc, vf4, vf2y
msubz.y vf25, vf4, vf2z
msubz.x vf24, vf4, vf2z
addy.z vf24, vf0, vf6y
addx.y vf26, vf0, vf6x
loi SQRT_2
mr32.w vf24, vf0
nop
nop
move vf27, vf0
mr32.w vf26, vf0
mr32.w vf25, vf0
nop
nop
mr32 vf3, vf5
mr32 vf7, vf6
mr32.y vf24, vf5
mr32.x vf26, vf5
mr32.z vf25, vf3
mr32.x vf25, vf7
make
better
games
• Three parts
• Calculate
elements
• Clear matrix
• Shift, add
and copy
into place
• 16/19 cycles
Matrix Conversion
• If you’re converting a quaternion and
going to use it immediately, can make
some assumptions
• Don’t create bottom row (just use vf0)
• Don’t clear right column (just use xyz)
• Saves four cycles in macro mode
case
make
better
games
Transform to Matrix
• Use one of the quaternion matrix
techniques
• Scale first three rows by each scale
factor
• Replace last row with translation
• Results:
> 29/32 for macro mode
> 20/23 for micro mode
make
better
games
Normalization
• Need to normalize quaternion to keep
it useful for rotation
> (Also avoids floating point drift)
• Fortunately PS2 has reciprocal
square root instruction
• Unfortunately it takes a while
make
better
games
Normalization
vmul
vaddaz.w
vmaddax.w
vmaddy.w
vrsqrt
vwaitq
vmulq
vf2, vf1, vf1
acc, vf2, vf2
acc, vf0, vf2
vf2, vf0, vf2
Q, vf0w, vf2w
vf1, vf1, Q
• Compute dot
product
• Compute 1/length
• Scale quaternion
• With stalls, takes
24/27 cycles
make
better
games
Normalization
• Another approach
> From “The Inner Product”, March 2002 Game
Developer by Jonathan Blow
> Approximate 1/x via Newton-Raphson iteration
> First iteration takes (looks like) 4/7 cycles on
VU0
> Second iteration takes as long as RSQRT
> Recommend: if x > 0.91521198, use approx
> Otherwise use RSQRT
make
better
games
Interpolation
• This is where it’s at
• It would be great if it was fast
• Um, well…
make
better
games
Interpolation
• First look at spherical linear interp
sin( θ(1  t ))  q  sin( θt )  r
slerp( q, r, t ) 
sin θ
• That’s a lot of sines
• Could precompute , 1/sin 
• But at least 28 cycles for one of the
other sines
• We (RSE) don’t use slerp anyway
make
better
games
Interpolation
• Lerp, then
lerp( q, r, t )  (1  t )  q  t  r
 q  t q  t r
is simply (q in vf1, r in vf2, t in vf3w)
vaddax acc, vf1, vf0x
vmsubaw acc, vf1, vf3w
vmaddw vf1, vf2,vf3w
• Need to normalize afterwards
• Makes 30/33 cycles
make
better
games
Interpolation
• Not quite that simple
• Problem: if q•r < 0, interpolation will
take long way around sphere
• Need to negate one quat
• Gives the same orientation, but the
interpolation takes the short route
make
better
games
Linear Interpolation
vmul vf4, vf1, vf2
vaddaz.w acc, vf04, vf4
vmaddax.w acc, vf00, vf4
vmaddy.w vf4, vf00, vf4
vnop
vnop
vnop
cfc2 t0,$16
and t0,t0,0x0002
vaddax acc, vf1, vf0x
beq t0,zero,Add
vmsubaw acc, vf2,vf3w
b Finish
Add: vmaddaw acc, vf2,vf3w
Finish: vmsubw vf1, vf1, vf3w
• Compute dot
product
• Check for negative
• Interpolate
• Follow up with
normalization
• Takes 43/46 cycles
make
better
games
Linear Interpolation
•
•
•
•
•
There’s more we can do
Jonathan Blow’s article, again
Use spline to correct error in lerp
More investigation needed
Initial results: takes about 24-26 more
cycles
• Looks faster than slerp, more
accurate than lerp
make
better
games
How We’re Using All This
•
•
•
•
A bit research-y at the moment
VU0-based math library
Optimization in specific routines
In particular, concatenation and
interpolation for bones animation
• More memory savings: store quat as
4.12 fixed-point shorts
make
better
games
Conclusions
•
•
•
•
•
Quaternions useful on PS2
Cheaper to concatenate (alone)
Convert to matrix to transform
Use linear interpolation
Check out Jonathan Blow’s article
make
better
games
References
• Shoemake, Ken, “Animating Rotation with
Quaternion Curves,” Computer Graphics, Vol. 19,
No. 3 (July 1985).
• EE Core Instruction Set Manual
• VU User’s Manual
• Sony newsgroups
• Blow, Jonathan, “Hacking Quaternions,” Game
Developer, Vol. 9, No. 3 (March 2002). [get
updated source from www.gdmag.com/code.htm]
make
better
games
Questions?
make
better
games
• Please hand in comment sheets
• Slides available at:
http://obiwan.redstorm.com/~jimvv
make
better
games
Vector Units and Quaternions
Jim Van Verth
Red Storm Entertainment
[email protected]
make
better
games