Lecture notes #18 - Stanford University

EE 387, Notes 18, Handout #32
GF(2 ) arithmetic: summary
m
Addition/subtraction:
Division (reciprocal):
• bitwise XOR (m gates/ops)
•
•
•
•
•
•
•
•
Multiplication:
•
•
•
•
•
bit serial (shift and add)
bit parallel (combinational)
subfield representation
log and anti-log tables
software
Euclidean algorithm
lookup tables
sequential search
time-memory tradeoff
exponentiation
subfield representation
log and anti-log tables
software
◮
Division is performed as multiplication by reciprocal: a/b = a · b−1 .
◮
Single-cycle division is not needed because decoders use few divides.
◮
Most multiplication and division methods take O(m2 ) bit operations.
EE 387, November 6, 2015
Notes 18, Page 1
Multiplication by constants
GF(2m ) is a vector space over GF(2) of dimension m.
We choose a basis {1, α, α2 , . . . , αm−1 } for some α (usually primitive).
Multiplication by constant b = b0 + b1 α + · · · + bm−1 αm−1 is a linear
transformation. It can be described by an m × m matrix B over GF(2).
To derive the components of B, use the distributive law:
a · b = (a0 + a1 α + · · · + am−1 αm−1 ) · b
= a0 b + a1 (αb) + · · · + am−1 (αm−1 b) .
The products bi = αi b for 0 ≤ i < m can be precomputed.
The components of these vectors can be stored in a binary matrix

 

b0
b0,0
b0,1 · · ·
b0,m−1

 

 b1   b1,0
b1,1 · · ·
b1,m−1 



.
B=
=

..
..
..
 ...   ...

.
.
.
bm−1
bm−1,0 bm−1,1 · · · bm−1,m−1
EE 387, November 6, 2015
Notes 18, Page 2
Multiplication by constants (cont.)
The product a · b is the vector-matrix product

b0,0
b0,1

 b1,0
b1,1
y = a · b = aB = [ a0 a1 · · · am−1 ] 
 .
..
 ..
.
bm−1,0 bm−1,1
···
b0,m−1


···
b1,m−1 
.

..
..

.
.
· · · bm−1,m−1
Each bit yj of product y is the inner product of row a with column j of B:
yj =
m−1
X
ai bi,j =
i=0
X
ai
i: bi,j =1
In the above formula sum is XOR, product is logical AND.
Any GF(2m ) scaler can be built using ≤ m(m − 1) 2-input XOR gates.
The typical scaler uses ≈ 12 m2 XOR gates.
These gate count estimates do not include the use of common subexpressions. Finding the
minimum circuit is an NP-complete problem.
EE 387, November 6, 2015
Notes 18, Page 3
Example: scaler in GF(26 )
GF(26 ) = binary polynomials modulo x6 + x + 1 (a primitive polynomial).
Multiplication by b = [ 1 1 0 0 0 1 ] is defined by the matrix B shown below.
The rows of B are xi b (i = 0, . . . , 5), msb on right.
Successive rows of B are obtained by shifting previous row right using the
feedback pattern [ 1 1 0 0 0 0 ] corresponding to 1 + x + x6 .

1
1

0
B=
0

0
1
1
0
1
0
0
1
0
1
0
1
0
0
0
0
1
0
1
0
0
0
0
1
0
1

1
0

0
 =⇒
0

1
0
y0
y1
y2
y3
y4
y5
= a0
= a0
= a1
= a2
= a3
= a0
⊕
⊕
⊕
⊕
⊕
⊕
a1 ⊕ a5
a2 ⊕ a5
a3
a4
a5
a4
The coefficients y0 , y1 , . . . , y5 of the product y = a · b = aB can be read
from columns of B.
EE 387, November 6, 2015
Notes 18, Page 4
Matrix of multiplier matrices
The matrix Mb corresponding to multiplier
b = xj mod p(x) consists of m consecutive rows
j
x mod p(x), . . . , x
j+m−1
mod p(x)
from the matrix of powers of primitive element x.
Since p(x) is primitive, the matrix of powers of x
mod p(x) is same as matrix of powers of primitive
element α.
For GF(24 ), the (15 + 3) × 4 matrix to the right (lsb
first) contains all multiplier matrices.
For example, Mα7 consists
table.

1
1
Mα7 = 
0
1
EE 387, November 6, 2015
of rows 7 to 10 from this

1 0 1
0 1 0

1 0 1
1 1 0
i
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0
1
2
1
0
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
0
αi
0 0
1 0
0 1
0 0
1 0
1 1
0 1
1 0
0 1
1 0
1 1
1 1
1 1
0 1
0 0
0 0
1 0
0 1
0
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
0
0
Notes 18, Page 5
General multiplication
Let GF(2m ) be polynomials over GF(2) modulo prime p(x) of degree m.
P
Let ti = ij=0 aj bi−j be the coefficient of xi in a(x)b(x).
a(x)b(x)
= (t0 + t1 x + · · · + tm−1 xm−1 + tm xm + · · · + t2m−2 x2m−2 ) mod p(x)
= t0 + t1 x + · · · + tm−1 xm−1 +
tm (xm mod p(x)) + · · · + t2m−2 (x2m−2 mod p(x))
We can precompute xm mod p(x), . . . , x2m−2 mod p(x) and store them as
rows of (m − 1) × m binary matrix:
  m 
 m
x mod p(x)
α
 xm+1 mod p(x)   αm+1 
  . 

T =
= . 
.
..
  . 

2m−2
α2m−2
x
mod p(x)
Rows of T are obtained by shifts with feedback corresponding to p(x).
EE 387, November 6, 2015
Notes 18, Page 6
General multiplication (cont.)
The product y = a · b can be expressed in matrix notation:
y = (a0 , . . . , am−1 ) · (b0 , . . . , bm−1 )
= [ t0 , t1 , . . . , tm−1 ] + [ tm , . . . , t2m−2 ] T
" #
I
= [ t0 , t1 , . . . , t2m−2 ]
= [ t0 , t1 , . . . , t2m−2 ]
T




1
α
..
.

α2m−2
where I is the m × m identity matrix and T is (m − 1) × m.



Low-level computational formula for product bits yj for j = 0, . . . , m − 1:
y j = tj +
m−2
X
i=0
tm+i Tij =
j
X
l=0
al bj−l +
m−2
X
i=0
Tij
m−1
X
al bm+i−l .
l=i+1
Each bit of the product vector is a sum of a subset of the produt terms ai bj .
The product vector consists of m bilinear functions of a and b .
EE 387, November 6, 2015
Notes 18, Page 7
General multiplication example
GF(24 ) can be defined by any of three prime polynomials over GF(2):
p1 (x) = x4 + x + 1 , p2 (x) = x4 + x3 + 1 , p3 (x) = x4 + x3 + x2 + x + 1
The respective T

1 1

T1 = 0 1
0 0
matrices are



0 0
1 0 0 1
1 0 , T2 = 1 1 0 1 ,
1 1
1 1 1 1


1 1 1 1
T3 = 1 0 0 0
0 1 0 0
The equations for the product defined using T3 :


1 1 1 1
[ y0 , y1 , y2 , y3 ] = [ t0 , t1 , t2 , t3 ] + [ t4 , t5 , t6 ] 1 0 0 0
0 1 0 0
= [ t0 + t4 + t5 , t1 + t4 + t6 , t2 + t4 , t3 + t4 ] .
Even though p3 (x) is not primitive, it is prime and can be used to define GF(24 ).
EE 387, November 6, 2015
Notes 18, Page 8
General multiplication example (cont.)
Expanding the matrix form yields these Boolean equations:
t0
t1
t2
t3
t4
t5
t6
= a0 b0
= a0 b1 + a1 b0
= a0 b2 + a1 b1 + a2 b0
= a0 b3 + a1 b2 + a2 b1 + a3 b0
= a1 b3 + a2 b2 + a3 b1
= a2 b3 + a3 b2
= a3 b3
y0
y1
y2
y3
= t0 + t4 + t5
= t1 + t4 + t6
= t2 + t4
= t3 + t4
Every product ai bj appears in at least one equation, hence m2 AND gates.
Also (m − 1)2 XOR gates are needed to compute {t0 , t1 , . . . , t2m−2 }.
The number of 1s in T counts the XOR gates needed to compute
{y0 , y1 , . . . , ym−1 } from {t0 , t1 , . . . , t2m−2 }.
Polynomials like x6 + x + 1 result in T matrices with few 1s.
We ignore common subexpression simplification.
EE 387, November 6, 2015
Notes 18, Page 9
Multiplication using subfield representation
Galois fields can be represented using subfields larger than the field integers.
A common case: GF(22m ) = pairs from GF(2m ) mod degree 2 prime
polynomial over GF(2m ).
Good choice: prime quadratic of form x2 + αx + 1, where α is in GF(2m ).
a · b = (a0 + a1 x)(b0 + b1 x)
= a0 b0 + (a0 b1 + a1 b0 )x + a1 b1 x2
= a0 b0 + (a0 b1 + a1 b0 )x + a1 b1 (αx + 1)
= (a0 b0 + a1 b1 ) + (a0 b1 + a1 b0 + αa1 b1 )x
Final answer: components of (y0 , y1 ) = (a0 , a1 ) · (b0 , b1 ) are
y0 = a0 b0 + a1 b1
y1 = a0 b1 + a1 b0 + αa1 b1
The only common subexpression is a1 b1 , which affects both y0 and y1 .
EE 387, November 6, 2015
Notes 18, Page 10
Multiplication using subfield representation (cont.)
Multiplication in GF(22m ) uses operations in the subfield GF(2m ).
In circuit below, there are 4 multipliers, one scaler, and 3 adders.
a0
y0
a1
α
b0
y1
b1
Multiplications can be performed in parallel, sequentially, or two at a time.
Thus we can trade off between time and gates.
EE 387, November 6, 2015
Notes 18, Page 11
Circuits for reciprocal in GF(2m )
Since a/b = a · b−1 , division requires computation of multiplicative inverses.
Methods for calculating reciprocals:
◮
Euclidean algorithm
◮
table lookup
◮
sequential search
◮
time-memory tradeoff
◮
exponentiation
◮
subfield representation
◮
recursive combinational circuit
Single-cycle division is not usually needed.
EE 387, November 6, 2015
Notes 18, Page 12
Reciprocals: Euclidean algorithm
Let GF(Q) is represented by polynomials modulo prime p(x).
Extended Euclidean algorithm for gcd(r(x), p(x)) finds a(x), b(x) such that
a(x)r(x) + b(x)p(x) = gcd(r(x), p(x)) = 1
If deg p(x) = m, algorithm takes O(m) operations on m-digit registers.
Implementation is straightforward. Drawback: O(m) clocks.
Remainders ri (x) and coefficients ai (x) can share memory.
p(x)
r(x)
a(x)
The final remainder r is a constant, and the reciprocal is r −1 a(x).
EE 387, November 6, 2015
Notes 18, Page 13
Reciprocals: table lookup
Store precomputed reciprocals in 2m × m ROM.
For gate arrays, one bit of ROM costs about 1/8 of 2-input NAND gate.
Reciprocal table for GF(28 ) costs ≈ 256 gates.
Combinational multiplier for GF(28 ) uses 64 ANDs + 66 XORs ≈ 260 gates.
However, lookup tables are not as feasible for larger fields.
For example, the gate equivalent of a reciprocal table for GF(210 ) is about
1 10
· 2 × 10 = 1280
8
This is much larger than about 400 gates for a combinational multiplier.
The lookup table size can be reduced by using precomputation to transform the input to a value
whose reciprocal is known, then postcomputation to adjust the inverse. For example, if the
inverse of βαi is δ (obtained from a table) then the inverse of β is δαi .
EE 387, November 6, 2015
Notes 18, Page 14
Reciprocals: sequential search
Reciprocal of a can be found by testing a · b = 1 for each b in GF(2m ).
All nonzero values b can be generated using a maximum-length linear
feedback shift register. E.g., when field is defined by p(x) = x5 + x2 + 1:
initial value: a
final value: 1
initial value: 1
final value: a−1
Shifting a register multiplies the contents by the primitive element α.
Left shift register is loaded with a while right shift register is loaded with 1.
The registers are shifted simultaneously until the left shift register reaches 1.
After every shift, the ratio of the left register to the right register is a.
If i is the number of shifts needed, then a · αi = 1, so the value αi in the
right shift register is the reciprocal of a.
EE 387, November 6, 2015
Notes 18, Page 15
Time-memory tradeoff
An associative memory (hash table) can be used to reduce the number of
clocks needed to find the reciprocal without using a complete lookup table.
Suppose we store the reciprocals of α16i for i = 0, 1, . . . , (2m )/16,
The following program fragment finds reciprocal of a in at most 16 steps:
for (i = 0; i < 16; i++) {
if (a · αi is in reciprocal table) {
return αi · reciprocal(a · αi ) ;
}
}
The search time can be decreased by using a larger associative memory.
The same approach can be used to reduce the storage needed for computing the discrete
logarithm from 2m entries for a direct table lookup to (2m )/c entries if c lookups are used.
EE 387, November 6, 2015
Notes 18, Page 16
Reciprocals: exponentiation
If β is a nonzero element of GF(q) then β q−1 = 1. Therefore β −1 = β q−2 .
Powers of β can be computed efficiently by squaring and multiplying by β.
Binary representation of 2m − 2 contains m bits: 11 · · · 102 .
square
initial value: 1
β
The successive values of the storage element are
m −2
1 , β 2 , β 6 , β 14 , . . . , β 2
We obtain β
2m −2
.
= β −1 in m − 1 clocks, one multiply/squaring per clock.
In GF(2m ) squaring is linear and represented by m × m binary matrix.
EE 387, November 6, 2015
Notes 18, Page 17
Reciprocals: subfield representation
Suppose GF(22m ) = pairs from GF(2m ) modulo prime x2 + αx + 1
over GF(2m ).
Special case: x + b. Use Euclidean algorithm to compute (x + b)−1 :
x2 + αx + 1 = (x + b)(x + (α + b)) + (b2 + αb + 1)
α+b
1
x + (α + b)
= 2
+ 2
x
(x + b)−1 = 2
b + αb + 1
b + αb + 1
b + αb + 1
In general, the reciprocal of an arbitrary element ax + b with a 6= 0:
x + (α + b/a)
(ax + b)−1 = a−1 (x + b/a)−1 =
a((b/a)2 + α(b/a) + 1)
αa + b
a
ax + (αa + b)
= 2
+ 2
x
= 2
b + αab + a2
b + αab + a2
b + αab + a2
Denominator 6= 0 because a 6= 0 and x2 + αx + 1 is prime.
Computation of (ax + b)−1 uses operations from the subfield GF(2m ): one
inverse, one scaler, one squaring, and three multiplications.
EE 387, November 6, 2015
Notes 18, Page 18
Reciprocals: recursive combinational circuit
m +1
If β is in GF(22m ), then (β 2
Since order of
m
β 2 +1
divides
2m
m −1
)2
m +1)(2m −1)
= β (2
2m −1
= β2
= 1.
− 1, it belongs to small subfield GF(2m ).
Its reciprocal can be computed using a “small” circuit.
Example: Reciprocal circuit for GF(28 ) = GF((24 )2 ):
16-th
power
β
m
Circuit for β 2
β 17
β
16
inverse
in GF(16)
β −17
β −1
is linear, ≈ 12 m2 XOR gates. Multipliers use O(m2 ) gates.
Subfield reciprocal unit is small. Overall cost of circuit ≈ 3 multipliers.
This method was discovered by Itoh and Tsujii in 1988. Hardware implementations of Galois field
arithmetic are presented in Christof Paar’s 1994 University of Essen doctoral thesis, Efficient VLSI
Architectures for Bit-Parallel Computation in Galois Fields.
EE 387, November 6, 2015
Notes 18, Page 19