PPT - ece.gmu

Lecture 11
Advanced Dividers
Required Reading
Behrooz Parhami,
Computer Arithmetic: Algorithms and Hardware Design
Chapter 15 Variation in Dividers
15.3, Combinational and Array Dividers
Chapter 16, Division by Convergence
Division versus Multiplication
Division is more complex than multiplication:
Need for quotient digit selection or estimation
Overflow possibility: the high-order k bits of z
must be strictly less than d; this overflow check
also detects the divide-by-zero condition.
Pentium III latencies
Instruction
Latency
Load / Store
3
Integer Multiply
4
Integer Divide
36
Double/Single FP Multiply
5
Double/Single FP Add
3
Double/Single FP Divide
38
Cycles/Issue
1
The ratios haven’t
1
changed much in
36
later Pentiums, Atom,
2
or AMD products*
1
*Source: T. Granlund, “Instruction
38
Latencies and Throughput for AMD
and Intel x86 Processors,” Feb. 2012
May 2012
Computer Arithmetic, Division
Slide 3
Classification of Dividers
Sequential
Radix-2
Array
Dividers
Dividers
by Convergence
High-radix
Restoring
Non-restoring
• regular
• SRT
• using carry save adders
• SRT using carry save adders
4
Fractional Division
Unsigned Fractional Division
zfrac Dividend
.z-1z-2 . . . z-(2k-1)z-2k
dfrac Divisor
.d-1d-2 . . . d-(k-1) d-k
qfrac Quotient
.q-1q-2 . . . q-(k-1) q-k
sfrac Remainder
.000…0s-(k+1) . . . s-(2k-1) s-2k
k bits
6
Integer vs. Fractional Division
For Integers:
z=qd+s
 2-2k
z 2-2k = (q 2-k) (d 2-k) + s (2-2k)
For Fractions:
zfrac = qfrac dfrac + sfrac
where
zfrac = z 2-2k
dfrac = d 2-k
qfrac = q 2-k
sfrac = s 2-2k
7
Unsigned Fractional Division Overflow
Condition for no overflow:
zfrac < dfrac
8
Sequential Fractional Division
Basic Equations
s(0) = zfrac
s(j) = 2 s(j-1) - q-j dfrac
for j=1..k
2k · sfrac = s(k)
sfrac = 2-k · s(k)
9
Fig. 13.2 Examples of sequential division with integer
and fractional operands.
10
Array
Dividers
11
Sequential Fractional Division
Basic Equations
sfrac(0) = zfrac
s(j) = 2 s(j-1) - q-j dfrac
s(k)frac = 2k sfrac
12
Restoring Unsigned Fractional Division
s(0) = z
for j = 1 to k
if 2 s(j-1) - d > 0
q-j = 1
s(j) = 2 s(j-1) - d
else
q-j = 0
s(j) = 2 s(j-1)
13
Restoring Array Divider
z –1 d –1
z –2 d –2
z –3 d –3
z –4
q –1
0
z –5
q –2
0
z –6
q
–3
0
Cell
s–4
FS
1
May 2012
0
Dividend
Divisor
Quotient
Remainder
z
d
q
s
s–5
=
=
=
=
.z–1
.d–1
.q–1
.0
z–2
d–2
q–2
0
z–3
d–3
q–3
0
Computer Arithmetic, Division
s–6
z–4 z–5 z–6
s–4 s–5 s–6
Slide 14
Non-Restoring Unsigned Fractional Division
s(-1) = z-d
for j = 0 to k-1
if s(j-1) > 0
q-j = 1
s(j) = 2 s(j-1) - d
else
q-j = 0
s(j) = 2 s(j-1) + d
end for
if s(k-1) > 0
q-k = 1
else
q-k = 0
15
Nonrestoring Array Divider
d0
z0
d –1
z –1 d
–2
z –2 d
–3
z –3
1
z –4
q0
z –5
q –1
Critical
path
z –6
q –2
Cell
XOR
FA
May 2012
Similarity to
array multiplier
is deceiving
q –3
s –3
Dividend
Divisor
Quotient
Remainder
s –4
z
d
q
s
=
=
=
=
z
0
d
0
q
0
0
.z
–1
.d
–1
.q
–1
.0
s –5
z
–2
d
–2
q
–2
0
Computer Arithmetic, Division
z
–3
d
–3
q
–3
s
–3
s –6
z
z
–4 z
–5 –6
s
s
–4 s
–5 –6
Slide 16
Division by Convergence
17
Division by Convergence
Chapter Goals
Show how by using multiplication as the
basic operation in each division step,
the number of iterations can be reduced
Chapter Highlights
Digit-recurrence as convergence method
Convergence by Newton-Raphson iteration
Computing the reciprocal of a number
Hardware implementation and fine tuning
May 2012
Computer Arithmetic, Division
Slide 18
16.1 General Convergence Methods
Sequential digit-at-a-time (binary or high-radix) division
can be viewed as a convergence scheme
As each new digit of q = z / d is determined, the quotient value
is refined, until it reaches the final correct value
Convergence is from below in restoring division and oscillating
in nonrestoring division
q
Meanwhile,
the remainder
s=z–qd
approaches 0;
the scaled
remainder is kept
in a certain range,
such as [– d, d)
May 2012
1
0.101101
0
Computer Arithmetic, Division
Digit
Slide 19
Elaboration on Scaled Remainder in Division
The partial remainder s(j) in division recurrence isn’t the true remainder
but a version scaled by 2j
Division with left shifts
s(j) = 2s(j–1) – qk–j (2k d)
|–shift–|
|––– subtract–––|
with s(0) = z and
s(k) = 2ks
q
1
Quotient digit selection
keeps the scaled
remainder bounded
0.101101
(say, in the range
–d to d) to ensure the
convergence of the
true remainder to 0
0
May 2012
Computer Arithmetic, Division
Digit
Slide 20
Recurrence Formulas for Convergence Methods
u (i+1) = f(u (i), v (i))
v (i+1) = g(u (i), v (i))
Constant
Desired
function
u (i+1) = f(u (i), v (i), w (i))
v (i+1) = g(u (i), v (i), w (i))
w (i+1) = h(u (i), v (i), w (i))
Guide the iteration such that one of the values converges
to a constant (usually 0 or 1)
The other value then converges to the desired function
The complexity of this method depends on two factors:
a. Ease of evaluating f and g (and h)
b. Rate of convergence (number of iterations needed)
May 2012
Computer Arithmetic, Division
Slide 21
16.2 Division by Repeated Multiplications
Motivation: Suppose add takes 1 clock and multiply 3 clocks
64-bit divide takes 64 clocks in radix 2, 32 in radix 4
 Divide faster via multiplications faster if 10 or fewer needed
Idea:
z zx (0 ) x (1)  x ( m 1)
q   (0 ) (1)
d dx x  x ( m 1)
Converges to q
Force to 1
Remainder often not needed, but can be obtained
by another multiplication if desired: s = z – qd
To turn the identity into a division algorithm, we face three questions:
1. How to select the multipliers x(i) ?
2. How many iterations (pairs of multiplications)?
3. How to implement in hardware?
May 2012
Computer Arithmetic, Division
Slide 22
Formulation as a Convergence Computation
Idea:
z zx (0 ) x (1)  x ( m 1)
q   (0 ) (1)
d dx x  x ( m 1)
d (i+1) = d (i) x (i)
z (i+1) = z (i) x (i)
Converges to q
Force to 1
Set d (0) = d; make d (m) converge to 1
Set z (0) = z; obtain z/d = q  z (m)
Question 1: How to select the multipliers x (i) ?
x (i) = 2 – d (i)
This choice transforms the recurrence equations into:
d (i+1) = d (i) (2  d (i))
z (i+1) = z (i) (2  d (i))
u (i+1) = f(u (i), v (i))
v (i+1) = g(u (i), v (i))
May 2012
Set d (0) = d; iterate until d (m)  1
Set z (0) = z; obtain z/d = q  z (m)
Fits the general form
Computer Arithmetic, Division
Slide 23
Determining the Rate of Convergence
d (i+1) = d (i) x (i)
z (i+1) = z (i) x (i)
Set d (0) = d; make d (m) converge to 1
Set z (0) = z; obtain z/d = q  z (m)
Question 2: How quickly does d (i) converge to 1?
We can relate the error in step i + 1 to the error in step i:
d (i+1) = d (i) (2  d (i)) = 1 – (1 – d (i))2
1 – d (i+1) = (1 – d (i))2
For 1 – d (i)  e, we get 1 – d (i+1)  e2:
Quadratic convergence
In general, for k-bit operands, we need
2m – 1 multiplications and m 2’s complementations
where m = log2 k
May 2012
Computer Arithmetic, Division
Slide 24
Quadratic Convergence
Table 16.1 Quadratic convergence in computing z/d
by repeated multiplications, where 1/2  d = 1 – y < 1
–––––––––––––––––––––––––––––––––––––––––––––––––––––––
i
d (i) = d (i–1) x (i–1), with d (0) = d
x (i) = 2 – d (i)
–––––––––––––––––––––––––––––––––––––––––––––––––––––––
0
1 – y = (.1xxx xxxx xxxx xxxx)two  1/2
1+y
1
1 – y 2 = (.11xx xxxx xxxx xxxx)two  3/4
1 + y2
2
1 – y 4 = (.1111 xxxx xxxx xxxx)two  15/16
1 + y4
3
1 – y 8 = (.1111 1111 xxxx xxxx)two  255/256
1 + y8
4
1 – y 16 = (.1111 1111 1111 1111)two = 1 – ulp
–––––––––––––––––––––––––––––––––––––––––––––––––––––––
Each iteration doubles the number of guaranteed leading 1s
(convergence to 1 is from below)
Beginning with a single 1 (d  ½), after log2 k iterations we get
as close to 1 as is possible in a fractional representation
May 2012
Computer Arithmetic, Division
Slide 25
Graphical Depiction of Convergence to q
1
1 – ulp
d (i)
d
q
q– e
z (i)
z
Iteration i
0
1
2
3
4
5
6
Fig. 16.1 Graphical representation of convergence
in division by repeated multiplications.
May 2012
Computer Arithmetic, Division
Slide 26
16.5 Hardware Implementation
Repeated multiplications: Each pair of ops involves the same multiplier
d (i+1) = d (i) (2  d (i))
z (i+1) = z (i) (2  d (i))
z(i)
x(i)
Set d (0) = d; iterate until d (m)  1
Set z (0) = z; obtain z/d = q  z (m)
2's Compl
d(i+1)
x(i+1)
z(i+1)
x(i+1)
(i+1) (i+1)
z
d(i) x(i)
z(i) x(i)
d
(i+1)
z(i+1)
z(i) x(i)
d
d
x
(i+1) (i+1)
x
(i+1) (i+1)
x
d(i+2)
Fig. 16.6 Two multiplications fully overlapped
in a 2-stage pipelined multiplier.
May 2012
Computer Arithmetic, Division
Slide 27
16.3 Division by Reciprocation
f(x)
The Newton-Raphson
method can be used for
finding a root of f (x) = 0
Tangent at x (i)
Start with an initial estimate
x(0) for the root
Iteratively refine the
estimate via the recurrence
f(x (i))
x(i+1) = x(i) – f (x(i)) / f (x(i))
 (i)
Root
Justification:
tan (i) = f (x(i))
= f (x(i)) / (x(i) – x(i+1))
May 2012
x(i+2)
x (i+1)
x
x (i)
Fig. 16.2 Convergence to a root of
f(x) = 0 in the Newton-Raphson method.
Computer Arithmetic, Division
Slide 28
Computing 1/d by Convergence
1/d is the root of f (x) = 1/x – d
f(x)
f (x) = –1/x2
Substitute in the Newton-Raphson
recurrence x(i+1) = x(i) – f (x(i)) / f (x(i)) to get:
x (i+1)
=
x (i) (2

1/d
x
d
x (i)d)
One iteration = Two multiplications + One 2’s complementation
Error analysis: Let d (i) = 1/d – x(i) be the error at the ith iteration
d (i+1) = 1/d – x (i+1) = 1/d – x (i) (2 – x (i) d) = d (1/d – x (i))2 = d (d (i))2
Because d < 1, we have d (i+1) < (d (i))2
May 2012
Computer Arithmetic, Division
Slide 29
Choosing the Initial Approximation to 1/d
With x(0) in the range 0 < x(0) < 2/d, convergence is guaranteed
Justification:
| d(0) | = | x(0) – 1/d | < 1/d
d(1) = | x(1) – 1/d | = d (d(0))2 = (d d(0)) d(0) < d(0)
For d in [1/2, 1):
Simple choice
1/x
x(0) = 1.5
2
Max error = 0.5 < 1/d
1
Better approx.
x(0) = 4(3 – 1) – 2d
= 2.9282 – 2d
Max error  0.1
May 2012
Computer Arithmetic, Division
0
0
x
1
Slide 30
16.4 Speedup of Convergence Division
z zx (0 ) x (1)  x ( m 1)
q   (0 ) (1)
d dx x  x ( m 1)
Compute y = 1/d
Do the multiplication yz
Division can be performed via 2 log2 k – 1 multiplications
This is not yet very impressive
64-bit numbers, 3-ns multiplier  33-ns division
Three types of speedup are possible:
Fewer multiplications (reduce m)
Narrower multiplications (reduce the width of some x(i)s)
Faster multiplications
May 2012
Computer Arithmetic, Division
Slide 31
Initial Approximation via Table Lookup
Convergence is slow in the beginning: it takes 6 multiplications to get
8 bits of convergence and another 5 to go from 8 bits to 64 bits
Better approx
Approx to 1/d
d x(0) x(1) x(2) = (0.1111 1111 . . . )two
Read this value, x(0+), directly from a table,
thereby reducing 6 multiplications to 2
A 2w  w lookup table is necessary and sufficient for w bits of
convergence after 2 multiplications
Example with 4-bit lookup: d = 0.1011 xxxx . . . (11/16  d < 12/16)
Inverses of the two extremes are 16/11  1.0111 and 16/12  1.0101
So, 1.0110 is a good estimate for 1/d
1.0110  0.1011 = (11/8)  (11/16) = 121/128 = 0.1111001
1.0110  0.1100 = (11/8)  (3/4) = 33/32 = 1.000010
May 2012
Computer Arithmetic, Division
Slide 32
Visualizing the Convergence with Table Lookup
1 – ulp
1
d
q– e
After the 2nd pair
of multiplications
z
After table lookup and 1st
pair of multiplications,
replacing several iterations
Iterations
Fig. 16.3 Convergence in division by repeated multiplications
with initial table lookup.
May 2012
Computer Arithmetic, Division
Slide 33
Convergence Does Not Have to Be from Below
1 ± ulp
1
d
q± e
z
Iterations
Fig. 16.4 Convergence in division by repeated multiplications with
initial table lookup and the use of truncated multiplicative factors.
May 2012
Computer Arithmetic, Division
Slide 34
Sequential
Dividers
with
Carry-Save Adders
35
Block diagram of a radix-2 SRT divider with partial
remainder in stored-carry form
36
Pentium bug (1)
October 1994
Thomas Nicely, Lynchburg Collage, Virginia
finds an error in his computer calculations, and traces
it back to the Pentium processor
November 7, 1994
First press announcement, Electronic Engineering Times
Late 1994
Tim Coe, Vitesse Semiconductor
presents an example with the worst-case error
c = 4 195 835/3 145 727
Pentium
= 1.333 739 06...
Correct result = 1.333 820 44...
Pentium bug (2)
Intel admits “subtle flaw”
November 30, 1994
Intel’s white paper about the bug and its possible consequences
Intel - average spreadsheet user affected once in 27,000 years
IBM - average spreadsheet user affected once every 24 days
Replacements based on customer needs
December 20, 1994
Announcement of no-question-asked replacements
Pentium bug (3)
Error traced back to the look-up table used by
the radix-4 SRT division algorithm
2048 cells, 1066 non-zero values {-2, -1, 1, 2}
5 non-zero values not downloaded correctly
to the lookup table due to an error in the C script
Follow-up
Courses
41
DIGITAL SYSTEMS DESIGN
1. ECE 681 VLSI Design for ASICs
(Fall semesters)
H. Homayoun, project/lab, front-end and back-end
ASIC design with Synopsys tools
2. ECE 699 Digital Signal Processing Hardware Architectures
(Spring semesters)
A. Cohen, project, FPGA design for DSP
3. ECE 682 VLSI Test Concepts
(Spring semesters)
T. Storey, homework
NETWORK AND SYSTEM SECURITY
1. ECE 646 Cryptography and Computer Network Security
(Fall semesters)
K.Gaj, hardware, software, or analytical project
2. ECE 746 Advanced Applied Cryptography
(Spring semesters)
J.-P. Kaps, hardware, software, or analytical project
3. ECE 899 Cryptographic Engineering
(Spring semesters)
J.-P. Kaps, research-oriented project

Download Report

PPT - ece.gmu

Paperzz.com

Your Paperzz

PPT - ece.​gmu

Paperzz.com

Your Paperzz

PPT - ece.gmu