Single Precision Reciprocal by Multipartite Table Look-up

Single Precision Reciprocal by Multipartite Table Look-up
Peter Kornerup
University of Southern Denmark
Odense, Denmark
David W. Matula
Southern Methodist University
Dallas, Texas, USA
E-mail: [email protected]
E-mail: [email protected]
Abstract— We develop the foundations for confirming
monotonicity of a multi-term reciprocal function approximation. We introduce the concept of operand recoding
to improve the accuracy of multipartite approximation.
The results are applied to provide a proposed four-partite
reciprocal implementation with total table size 27 Kbytes,
that yields an IEEE standard, single precision format (24
bit) reciprocal instruction, that is a one-ulp monotonic
reciprocal.
I. I NTRODUCTION
There has been considerable investigation of bipartite and multipartite function approximations in
the recent literature [1], [2], [3], [4], [5], [6], [7],
[8], [9]. Bipartite reciprocal approximations have
been employed for approximate (low precision) reciprocal instructions in commodity microprocessors,
targeted at multimedia applications. The question of
monotonicity of reciprocal approximations has been
discussed in [5], [6]. In this paper we investigate
the applicability of the multipartite approach to
obtaining an IEEE single precision (24 bit) one-ulp
monotonic reciprocal function.
Summary:
Given a
divisor,
, we shall show
that it is possible, by a multipartite table look-up
method, to determine an approximate reciprocal
value
"!#%$'&()+*,.- /10
203
54603
703
5498': ( 1 * (1)
;
<=
>
?@A5B/CDE F/G
-- or !"#% $+
&7 ( *H
, having relative error less
than
. This is equivalent to having the absolute
II
error bound: II
54
I I
I
"!#/$+&J(K'* IIL I
(2)
single
precision
In Section II we develop the foundations of
monotonic one-ulp reciprocal functions. In particular we introduce and prove a monotonicity theorem. Specifically, if !"
5#%4 $M&(K'* satisfies (2), then
the reciprocal function N (K"!#/$ & (K'** , obtained by
rounding1 such anJOapproximate
reciprocal function
PQ
position, is a one-ulp monoto nearest at the
tonic single precision reciprocal.
For single precision division with dividend R STUV
,
R R
R , normalized so that R
normalized (rational) exact quotient W
let W be the
XTUV
given by
Y 5\
Z[ ]
for R L
\
W for R_^
]
&
For "!#%$ & ()+* satisfying (1) and (2), let W ( R` '*
be the normalized binary quotient approximation
49b8':cSXT *
&
W ( R` +*a
W W
W
determined by
Ydd !#/$ & (K'*
R
for R L
`
!#/$ & (K'*
& ( +*, Z [d
W R`
d R
for Rfe
`
.
for R
54
& ( '*
L which
W R` g Li
h
Hence g W
from
Iit follows that g W N ( W & ( I R` '** g L
and
I ( & ( +**6k 3 * II
3
I
K
(
j
L
W
W R`
.
For lnmpo qsr5tvuwtxsyz , the fixed-point round-down {}| l1 ,
y"~
round-up {}€ l1 , and round-to-nearest (midpoint down) {}€ l1
y"~
y"~
roundings, each determine either the
 -bit unnormalized
‚
~
„

†
ƒ
…
‘
binary value ˆŠ‡ ‰n‹ŒŽ Œs‘Œ ˆM’’9’ Œ with q”“–•f“—t yŠ˜ u
y
….
Specifically we have
I
1
I
I •
•
I “žl
…
{}€ y ~ l1 ‹w™›šœ
ƒ tvŸ¡ t
ty
y
‡
with similar expressions for {}¢ l1 and {5| y~
y"~ l  . For normalized
‘
input l£m¤o ˆ  , the output is normalized, e.g., {}€ l1 ‹
…
y"~
q  … Œ ˆ Œ1¥ ’’9’ Œ y or {Š€ y"~ l ‹ … .

&
W
W is a
Note that W N ( W ( R` '** *
directed breakpoint in the sense that #N( allows W or W $ to be correctly chosen as the
precise round-down (or round-up) single precision
of
R by . Similarly, W result
for division
&
j ( W ( R` '** k
is a round-to-nearest break
midpoint, allowing #N2( W
R * to dictate the
correct round-to-nearest single precision division
result.
Thus the multipartite table lookup procedure
described here provides for implementing a oneulp monotonic, single precision reciprocal function,
without the need for a multiplier, and for obtaining a
single precision division result, employing only two
(dependent) single precision multiplications. Our
suggested solution is a four-partite table lookup with
Kbyte. These methods allow
total table size
relatively low-power implementations of the SSE
paired, single precision reciprocal and division instructions incorporated in current X-86 processors,
targeted at low-power multimedia computations.
In Section III we review the fundamentals of
bipartite table construction, and Section IV introduces the notion of operand partial recoding for
constructing multipartite tables. In Section V we
present a four-partite look-up table procedure for
obtaining a single precision, one-ulp monotonic
reciprocal function.
II. U LP -ACCURATE M ONOTONIC R ECIPROCAL
F UNCTIONS
The reciprocal approximation !#/$(K'*
- /03
03 0 is termed a -bit
one-ulp reciprocal
!#/$(K'*
g L
for all normalized
when g ]
XT *
, and similarly is a -bit
binary
divisors
4 -ulp reciprocal when g !"#%$()+* g L 4 .
]
Observation 1: A -bit one $
reciprocal
!"#% $()+* is either the round-up or round-down value
XT *
of ] for all normalized binary divisors
.
That is,
"!#%$2(K'* j ( * ` ( * for all SXT * `
]
]
with "!#/$(K'* always being the -bit value nearest
in the direction of the approximation.
Note that a one $
reciprocal is
efficiently computable by first obtaining
a multiple term 03
Treciprocal
approximations
03 008„ 08':
!"#%$+&(K'*
, with
guard bits 08„ 0 8+
08': , that satisfies
!#/$ & (K'*
. Then the guard bits
g]
g L
are rounded off to obtain
the -bit one $
N (K!#/$+&J(K'** . Such one
reciprocal "!#%$2(K'* $
reciprocals have applications as a short
reciprocal in high radix division algorithms and
as the approximate reciprocal function value
for a reciprocal instruction implementation. For
implementation of a one $ reciprocal function
as a reciprocal instruction, it is also desirable to
investigate the monotonicity properties of such an
approximate function.
Rounding Off Guard Bits - Monotonic Reciprocal
Instruction: In the remainder of this section we focus on the important reciprocal function
application
‚9
, and
where the (exact) inputs
03
T030 !
/
#
$
K
(
*
.
(approximate) outputs
(or
"!#/$(K * ) are both -bit normalized values with
too large for direct lookup to be practical, e.g.
JO . In this case a multi-term computed reciprocal approximation with guard bits rounded off,
to provide a one-ulp reciprocal is only guaranteed
X! *
monotonic for over the subinterval
. In
particular, it can be shown that the output step size,
! *
for a one- $ reciprocal for over
, can vary
! T *
from 0 to 3 ulps, and over
the step size can
be down by as much as two ulps, or reverse direction
and be up by one $ , contradicting monotonicity.
Figure
1(a) illustrates a 5-bit one $ reciprocal
#" 7 ( * which systematically chooses the value of
]%$
7
7
one halfthe pair & j ( ] 7 * ` ( ] 7 *(' that
is at least
#
" 7 ( * g L
L g] 7
ulp7 away from ] 7 , i.e., where
]
(
* . The step function graph in
for Figure 1(a) clearly illustrates that such a perverse,
one-ulp reciprocalcan
have exaggerated variability
! *
in step size over
and be non monotonic to
! T *
the extent of virtual oscillation over
.
!
"
%
#
M
$
&
K
(
'*
Note
that
computing
- 03
030 0 8„„ satisfying g "!#%$ & ()+* g L
]
4 results in "!#/$(K * N)9(K!#/$'&(K'** being a
4 - $ reciprocal.
Figure 1(b) illustrates for +*
7
a 5-bit 4 - $ reciprocal function ! "#%$() * that
7
chooses the farthest away of & j ( ] 7 * ` 9( ] 7 *,7 '
7
whenever the farthest yields g ] 7 "!#%$2(K * g L 4
,
and otherwise chooses the unique one satisfying
the 4 -ulp bound.
Lemma 2: For a normalized -bit divisor 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
1
1.2
1.6
1.4
1.8
2
1
1.2
1.6
1.4
(a)
Fig. 1.
1.8
2
y
y
(b)
(a): A 5-bit non-monotonic “round-away”, 1-ulp approximation, and (b): A
, an -bit 4 - $ reciprocal function
XT *
is monotonic over the interval
, and8„strictly
X ! V *.
monotonic over the portion
For
Proof:
consecutive exact inputs ` k
3
the consecutive
reciprocals
decrease
3
e .
by ] ] 8+
] ‚] 8+
Thus exact outputs decrease by at least one-half
$ of output. Suppose j ( ] * L ( ] 8+
* .
Then
the sum
of the rounding
errors satisfies
j) ( ** k (K 9( 8+
* (
] ] ] ] 8+
* (
j) ( ** e
8+
* k (K ( 8+
*
] ] ] ] 3
. Hence at least one
of the rounding errors
4
$
be greater than
would
, a contradiction. Thus
j ( * ^ ( 8+
* holds for all iXT * ,
] ] and a 4 -ulp reciprocal function is monotonic (i.e.,
XT *
monotonically
non
increasing)
over
. Suppose
X
!
* . Then
8+
e
] ] 3
3
8+
3
4
j ( * e
.
If
‚] ]
( 8+
* , the sum of rounding errors would
] 4
be greater than
, a contradiction. Thus a -ulp
„
8
monotonically decreasing for
reciprocal
is strictly
V ! V
*.
In practice the “guarded” computation of a multiterm reciprocal approximation !#/$ & (K)* can often
be shown
to satisfy a maximum relative error bound
XT *
for
, that is of the same order as the
XT *
maximum absolute error bound for .
Importantly, obtaining just one extra bit of precision
-ulp monotonic approximation.
in the relative error bound on !#/$ & (K'* before
applying the final rounding is now shown sufficient
to yield monotonicity.
Theorem 3 (Monotonicity Theorem): For a nor‚T
, let
malized -bit divisor
"!#/$+&J(K *_- 03
T00 0 8„„ be a reciprocal
) strictly less than
approximation
with relative error
for all XT . Then !#/$(K *H
N) (K!"#%$+&(K ** is a monotonic,
)
one-ulp reciprocal
UV .
function over
!"#%$ & (K'*
Proof: A reciprocal approximation
with
relative error strictly
less than
satisfies
"!#/$ & () *
. So then N(K"!#/$ & (K * * g ] g L ] XTf 3
is a one- $ reciprocal for N) (K!"#%$ & (K)**
satisfying g ] .
g L ( k ] *
(Note that this bound scales down towards 4 - $
S
as ). For
successive
-bit normalized divisors
XT
` k difference of
their recip, the
8+
8+
^
rocals
satisfies
]
]
]
‚
]
. Assume
that N ()"!#%$ & () ** j ( ] * L
]
**
( 8+
* ¤N9(K!"#%$ & (Ksk . Then the
]
successive reciprocal rounding errors
sum of these
N) (K!"#%$ & (K)** k (KN ()!"#%$ & (K¡k
(
satisfies
] ** 8+
* ^ ( 8+
*,k 3 ^
]
]
]
k . So at least one of these
rounding
errors
]
(
k
*
is greater than or equal to
, a con]
&
N
)
(
"!
%
#
$
)
(
*
*
N
K
(
"!
#/$ & (K2k
tradiction.
Thus
^
**
!
/
#
$
K
(
›
*
)
N
K
(
!
"
%
#
+
$
&
K
(
*
*
, and
is mono-
STUV tonic for .
III. B IPARTITE TABLES
The bipartite table lookup process for determining
an approximate
reciprocal of a normalized binary
‚19
†
, comprises the use of two
divisor
distinct binary direct lookup tables of comparable
size. These tables are concurrently addressed by
distinct, equivalent length substrings of divisor bits,
with each table fashioned to provide a distinct part
of a carry save or borrow save representation of the
approximate reciprocal. Specifically, our bipartite
reciprocal approximations are of the form
!"#%$()+*a¤!"#%$ (K *2kV!#/$ ()+* , the primary approxWith (K * !
"
%
#
$
imation
is determined by the
`
1
-bit index
. The secondary approximation term is determined by some leading bits and

8„T
8+
some supplementary trailing bits
. The
()+*
"!
%
#
$
approximation may be fashioned so that
is exclusively positive or negative, with magnitudes
less than a unit, or sign-symmetric with magnitude
less that half a unit. Partitioning the operand into
2 illustrates the lookthree equal -bit parts, Figure
up process, employing
leading bits of a higher
precision divisor.
8„‚
8„„ T /8s„
:
precision, at a cost of only twice the table size,
compared to a single direct lookup table. For use as
a seed or short reciprocal in application to division
algorithms, the redundant reciprocal approximation
may be sent directly to an appropriate multiplier
recoder. For reciprocal function output in standard
binary form the two outputs require a supplementary
carry-completion addition.
Determination of the entries in a bipartite lookup
table pair is guided by the following exact expansions particular to the reciprocal function:
Theorem 4 (Bipartite Reciprocal Identities): For
†¤}
k the normalized
binary
divisor
partition
‚T
‚ 8„ 8+
where v and S - the
1 can be expanded to the sum of
reciprocal ]
a primary term, determined by , and a secondary
term of magnitude less than
, according to any
of the following
(borrow save expansion)
(3)
H k
(carry save) (4)
k
6()3k
*
k
„
8
8„ (midpoint)
(5)
k
6() k
*
Proof: Putting the primary and secondary
terms over a common denominator yields an imme
diate reduction
For borrow
] verifying
]T each
]identity.
save ]9] ]] ]9] ]] ] , and similarly
for the carry save and midpoint expansion identities.
Table 1
Table 2
!#/$ ()+*
!#/$ () *
Recoder/
Adder
"!#%$2(K'*
Fig. 2.
The Bipartite Table Look-up Method
The compelling advantage of the two table bi
partite lookup process is, that it provides
a sim
ple procedure to achieve essentially times the
The claim that bipartite reciprocal approximations
derived from (3) to (5) can have precision bitsis
supported by the following
observations. Let and consider the input bits in Figure 2.
A
primary table employing
the bit index
with bits of output, allows the
primary term to be approximated
with error less
than half a unit in the k place. A secondary table
19
a
8„
8+
uses index
, formed by
concatenating leading fraction bits of with
leading bits of . This table can provide a ( k * -bit
output value for !"#%$ ()+* , allowing the secondary
term to be approximated to near the order of a
unit in the place. These arguments will
now be made precise, leading to the specification
of formulas for direct lookup table entries that
minimize the maximum absolute errors in each of
the terms in expansions (3) to (5). For bipartite
expansions it is most convenient to fix a common
last place position for both terms, and minimize the
maximum absolute error contributed by each term.
Note that the primary terms in each of the identities have exact inputs. Their evaluation can provide
entries to -bits-in -bits-out direct lookup tables
8+an
absolute error for each entry bounded by
with
3
, due only to rounding the exact output to
the output table size. E.g., for the borrow save
k expansion with , let the output size be
bits where ^ - is a small number of guard bits.
„ , the primary term
Excluding the special case
k '*
approximation in the
-bits-in (
-bits-out
table is
!"#%$ () *,.N 8':8„ ] ¤- /10
T0303 8':8„T
The primary table size for !"#%$ (K * is 8':8+( k
'* bits with maximum absolute error .
"!#/$ (K'*
The
secondary
term
approximation
borrow save reciprocal expansion (3)
]9] for the
with
, will be determined
from
the
‚
leading fraction bits of , where
along with the
leading bits of with ‚
8„
8+
(K'* !
/
#
$
. Thus we have
!"#%$ () ` * . In terms of the arguments () ` * we
note the following bounds
on each of the factors of
the secondary term ]] .
L k k S L
k k V k k L
‚
with
Lemma 5: For the divisor
O
9
‚9
,
let
,
,
^
8„
8+
‚T
8„T
8+
i
and
.
Then the borrow save expansion secondary term
satisfies the following tight bounds which are tight
in the sense that ]9] can be arbitrarily close to
either bound:
(K
k
*()
k
V k
L
L
1
k
()
k
k
*
3
* (6)
Let m () ` * be the midpoint of the interval
determined by (6). The value m (K ` * minimizes
the maximum absolute error for approximation of
separate regions determined
9] ] in each of the
f
8„
8+
by each index
. The
maximum error over all the regions
will
occur
for
the argument pair
with index
`
leading to
Corollary 6: Let (K ` * be the midpoint of the
interval determined
by (6). Then
I
I
I
I
I I
* I
I
)
(
I I
L
`
The secondary term in our bipartite borrow
save approximation is then determined by rounding
m (K ` * to the last place position k k ,
!"#%$ () ` * N 8':8„ ( () ` ** Including the two rounding errors we further obtain
the following from Corollary 6.
Corollary 7: The borrow save bipartite reciprocal approximation for the normalized divisor V
‚T
given by
"!#/$(K'*a.N 8':8„
N 8':8„ ( () ` ** ] satisfies the
bound
I absolute error
I
I I
I
I
k !
/
#
$
K
(
'*
L
]
For
the maximum
error is then
(O k *
with total table size
. With just a few
guard bits we approach
the
case
for where
"!#/$(K'*a ] (K ` *
with g ] !"#%$ ()+* g L
4 . It can be shown that the maximum rela
,
tive
error
for such a bipartite table
occurs for —
(
*
so the precision is at least
-bits.
In practice bipartite tables arefound
most effective for total index lengths , where each
part has between and bits. Considering variable
sized parts the preferred partitions of index parts are
* * ( k *
g g ` ( k
g g , and ( k
g g
.
Exploiting Symmetry in Bipartite Tables: The
midpoint expansion (5) allows for design of a sign
symmetric bipartite table process, providing one
additional bit of accuracy. For the symmetric case
some of the input bits and the secondary term
approximation are subject to a conditional complementation.
When the approximate reciprocal type is a reciprocal function defined on ’exact’ input points, the
midpoint expansion (5) can be modified to yield a
symmetric secondary term.
Symmetric Bipartite Reciprocal1Functions:
For
T
, the
the normalized -bit divisor †
„k has
secondary
part
of
the
partition
- ‚ 8„9 8+
with - ` U— .
secondary
The
part can be centered by subtracting
( ¤ * and adding the same to the primary
part. The symmetric divisor partition for the -bit
19
is then
normalized divisor †
8„ * ¤ £ k k—( k (7)
From (7) for any precision the symmetric bipartite
identity for ] is then
HV ( k *
8„
k
„
8
„
8
V *
k 6() k (8)
Then the symmetric
bipartite
reciprocal
function
k *
-bit normalized binary divisor ”
for the (
is determined from (8) with k and by
( * k
8„ S
8„ k V *
(K k 8+
p ( k *,
Here
is
determined
so
that
( * where
8+
8+ 8„ 
8+

8+2  8„T
8„ ‚T
8+
TT
8+ 8„ 8„ This allows the bounds
()
k
*
L
6() k
8„ V
* L
(9)
where the interval midpoint (K ` * from (9)
is used to determine the second term of the
symmetric bipartite approximate reciprocal function. The centering of the secondary part in (7) and
(8) thus provides for a sharp result, since is exact,
and shares
the practical convenience of determining
by a ’s complement.
IV. M ULTIPARTITE TABLE L OOK - UP
The bipartite table lookup process for determining
an approximate reciprocal can be expanded to a
tripartite or multipartite process. The result is then
the sum of three or more terms obtained from three
of more table lookups indexed by comparably sized
indices. Tripartite tables in principle should achieve
4
times the precision and cost about times
about
the table size as a single direct lookup table.
In practice multipartite tables are arguably most
effective for tables with total input index lengths
and resulting output approximation precisions in the
JO
range * to
bits. This range can be covered
employing three to four-term sums with primary
table indices bounded by eleven bits. These practical
bounds keep total table size moderate. They also
allow table lookup and subsequent addition time to
be kept small.
For practical primary table indices of size at most
s
bits, the marginal improvement in tripartite and
four-partite table approximations for each additional
part is only 2-3 bits per part. For these index ranges
the multipartite process is conveniently visualized
by recognizing the divisor partition as a partial
recoding operation.
Exploiting Recoding in Multipartite Tables:
‚1
8„
- Definition
Let
with 8„
8„ ' 8:
k . Then for ^ ,
and ž^
`
O
*
a -digit partial recoding (Booth radix ) of ( denotes the expansion
4
8„ k k
T k
k
k
V ( HV * ( ›
the
tail satisfying
with
U
* and .
&
`
` - ` ` ' for
of the tail
Note that the condition
on the range
3
* for 8„ (
8„ * (
1 ¤
makes the expansion unique.
In practice the digits are determined from the
T
8„
concurrently
as in standard
bit triples ‚ 8+
8+ Booth recodings. The tail
is determined
from conditionally complementing
T
8+
8+ T
8„
the bits
depending on bit
as
described for symmetric bipartite expansions. The
notion of partial recoding is extendable to Booth
radix recodings in the obvious way.
8„ bipartite midThe divisor (input) partition for
the
*k ( * point expansion (5) is † ()k
.
This provides the basis for multipartite (output)
expansions by partial recoding of the secondary
term of (5).
Observation
9: Let the normalized binary divisor
XT *
8„ the recoded
8+
8+ partition
have
tripartite
f () k * k k with ‚T
U - '
L .
`
&
`
` ` ` , and
The primary table
can provide
suitably rounded
+
8
+
8
+
8
.
` ‚]
values for ]
and ]
k
The latter two values are sent
to Booth radix
PPG’s with input digits and , providing
the selected terms for !"#%$ ()+* and !#/$ (K'* . The
final term is provided by a terminal term table with
1
H 8 8 8Mb
by substituting
Proof: The result is obtained
as previously
index
*
k
3
(
the partial recoding
into the described for the recoded tripartite expansion.
bipartite midpoint expansion
(5).
V. A S INGLE P RECISION , M ONOTONIC
From (10) with
we obtain a recoded triparU LP -ACCURATE R ECIPROCAL F UNCTION
tite expansion for use as a seed or short reciprocal,
USXTa” Let . We split our
!"#%$()+*,¤N 8':8+
8„ k reciprocal function into two cases, corresponding to
8':8+
8+
two sub-intervals:
j 8':8„
8„ k T
(K k *
8+
Case 1:
k_( * k†( * !"#%$ ( 9
H 8'4 8+ * Let have the partition
with
‚K
TK 1T
T
sign-symmetric
fractional
part
K
5
4
The -bit index
can retrieve
both a
TT1 k V
( k k '* -bit output for N 8+
8': ( 8+
* and and .
Employing
]
k k +*
8':8„ (
s*
(
N
+
8
] ] ] a
-bit output for
. The the symmetric bipartite identity ]
‚] iteratively,
we
obtain
second output can be conditionally complemented
— *
54
(
and/or shifted to determine !#/$ (K'* as an approx (
k
*
+
8
imation
of
satisfying
] ] (K *
(K *
I
I
I
I
8+
I
I *
I !#/$ ()+* I „
8
Defining our constant term (
by
I
I
6() k
*
adding half the maximum error
4 8+
8+ 8': k L
HS
7
k
(K *
The approximation forP the terminal term
8+
is handled as
!"#%$ ()+*
7
] ] k ] ‚] ,
we
obtain
that
]
‚
]
for the secondary term in the symmetric
where to a smaller
order g Mg L .
bipartite
expansion,
employing
the
bit
string
ž 8'4 8+7 8+
be partially recoded with two Booth
Let as the index to a
8 digits and ab symmetric tail, then separate terminal term table. The recoded tripartite radix
¡O O '
O
k 3
k with ` &
`
`
`
expansion here employs an intermediate Booth
4
digit in the tripartite divisor partial recoding, to and g g
, where
8„ ( ‚ * obtain a bit enhancement of the precision of the
, ( * result, compared to symmetric bipartite reciprocal
54
S
b 9
Tb k approximation.
Letting
, it can be
Analogous to Observation 9 we can employ a shown that
O
recoded -part divisor partition
including two
inb
O
k
termediate Booth radix digits and obtain a -part
(K *
(K b *
(K b *
identity
8+ with g g L . Then it can be shown that
„
8
„
8
k 6() k *
Šb
)4
54
k
8 8 Mb (K b * () *
(K *
(K *
8„ 8„ (K k *
6() k *
(11)
Then
8+
8„ 8„
6() k
*
8+ „
8
(10)
6() k *
1 S Tv_ L
with g g
. Since
for
, we 1.1 Tb K
1KT)4 K7 1Šb 1 T
can use a 10-bit
index for determining simultane
3
3
5
4
1
6
ously , ] and ‚] , and another 11-bit index
P
determines ] , all with sufficient guard bits. The
first four terms of (11) then provide
a four-partite
Table
1
recode recode reciprocal approximation
to
54 ] , with error bound
arbitrarily close to ] ‚] .
Table 2
Using 4 guard bits so that each of the four terms
7
54
contributes
a
table
based
rounding
error
of
at
most
ulps, where here $ž
, we obtain a four"!#%$2(K'* satisfying
I
I approximation
MG
MG
partite
reciprocal
54
I I
I
I
&
"!
/
#
$
K
(
'*
k
L
I ]
I
.Then
7
] ] 54
I I
I
k
N 54 (K!#/$ & (K'** I L
.
]
] ‚] 4-to-2 Adder
5
4
(red.)
monotonic
Claim 1: N ()"!#%$ & ()+** is a one-ulp
UV reciprocal function over the interval
.
54
.
Claim 2: If N ()"!#%$ & ()+** is not monotonic then Fig. 3. Four-partite table reciprocal look-up for the interval at
one
rounded reciprocal has an error at least
least
8+
ulps.
k
‚]
in the 4-to-2 adder, maintaining guard
bits, to a
Claim 2 can be verified by an argument similar redundant reciprocal in the range , including
to that of the proof of the Monotonicity Theorem two leading guard digits [10].
(Theorem 3). Consider that the maximal
7 total roundFor output as a single precision reciprocal the
k ] ulps.
ing error
is
essentially
bounded
by
redundant result must be compressed by a carry
V completing
adder with rounding and normalization.
the
interval
,
Now ] ] e 7 over
For
use
as
a
divisor reciprocal, the result is recoded
k ] . Therefore the error bound
so 7 k ] e
for multiplication by the single precision dividend,
of k ] ulps is sufficient to guarantee
that
no
to obtain a quotient breakpoint by adaptively round k error after rounding is as large as
] ulps. ing with respect to the rounding mode (see Sec 54 ()"!#%$ &()+**
It follows
that
is monotonic for tion I).
TcV V
.
7
Case
2:
Since k ] L also verifies a one-ulp bound
( 
TK
* for
TcS For
this
region
we
use
11
bits
for
, the four-partite approxima
54
K
primary table
Let have the partition
index.
tion N ()!"#%$ & (K'** , with !#/$ & ()+** given by the the
† K
k( * K
withK ” ‚1
5K4 )4 s and
four terms in (11), is a single precision,
one-ulp
3
* k †
k ( K
TcV
. Proceeding
as
monotonic reciprocal function over .
,
] ‚] b
Figure 3 illustrates an implementation of this in Case 1, the quadratic term is now
yielding an error term ] ]
with g Mg L
,
four-partite reciprocal
function. Table 1 receives
the
T
T 1
s( TT1K
*
after centering by adjustment of . The
10-bit index
and outputs , ‚] O and , with table
values , and all rounded terminal term now satisfies
b
to position
.
k
`
(K K
*
(K *
(K *
The terms and are each input to both multiple generators, MG, where MG functions identically
P
to a Booth radix-8 PPG. Table 2 receives the 11-bit
where ] still can be determined from an 11-bit
P
Tb † T
T „ index
for determining ] , index
, since there is one less
rounded to position
. The sum is compressed trailing bit, and one more leading bit than in Case 1.
We then obtain
[4] J.-M. Muller, “A Few Results on Table-Based Methods,”
K7
Reliable Computing, vol. 5, no. 3, pp. 279–288, 1999.
k
[5]
C. Iordache and D. Matula, “Analysis of Reciprocal
() *
(K K
*
(K K
*
(K K
*
and Square Root Reciprocal Instructions in the AMD
(12)
K6-2 Implementation of 3DNow,” Electronic Notes in
S with g g L
for
.
Theoretical Computer Science, vol. 24, 1999.
Then the four-partite approximation "!#%$ & ()+* , [6] F. de Dinechin and A. Tisserand, “Some Improvements
on Multipartite Table Methods,” in Proc. 15th IEEE
formed from the first four terms of (12) by roundSymposium
on Computer Arithmetic. IEEE, 2001, pp.
ing the table entries with four guard
bits, will
128–135.
have a maximum
error bound of ulps for
[7] W. Wong and E. Goto, “Fast Evaluation of the Elemen V 54
. Then N (K"!#/$ & ()+** is a one-ulp
tary Functions in Single Precision,” IEEE Transactions
V3 on
Computers, vol. 44, no. 3, pp. 453–457, 1995.
monotonic reciprocal function for
.
[8] J. Pineiro, J. Bruguera, and J.-M. Muller, “Faithful Pow
Figure 4 illustrates the look-up table structure for
ering Computation using Table Look-Up and a Fused
*
implementing this reciprocal function over
.
Multiplication Tree,” in Proc. 15th IEEE Symposium on
The tables of Figures 3 and 4 have combined size
Computer Arithmetic. IEEE, 2001, pp. 40–47.
totalling less than 27 Kbytes, and the two structures [9] F. de Dinechin and J. Detrey, “Multipartite Tables in JBits
for the Evaluation of Functions on FPGA’s,” in IEEE Recan share much of the hardware shown, using suitconfigurable Architecture Workshop, International Paralably placed multiplexers.
1.0 K
KT)4K7 Šb 7
4
3
3
recode recode Table 1
4
Table 2
1
MG
MG
4-to-2 Adder
Fig. 4.
(red.)
Four-partite table reciprocal look-up for the interval .
R EFERENCES
[1] D. DasSarma and D. Matula, “Faithful Bipartite ROM
Reciprocal Tables,” in Proc. 12th IEEE Symposium on
Computer Arithmetic. IEEE Computer Society, 1995,
pp. 17–28.
[2] H. Hassler and N. Takagi, “Function Evaluation by Table
Look-Up and Addition,” in Proc. 12th IEEE Symposium
on Computer Arithmetic. IEEE, 1995, pp. 10–16.
[3] M. Schulte and J. Stine, “Approximating Elementary
Functions with Symmetric Bipartite Tables,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 842–847, 1999.
lel and Distributed Symposium, Fort Lauderdale, Florida.
IEEE, April 2002.
[10] P. Kornerup and J.-M. Muller, “Leading Guard Digits
in Finite Precision Redundant Representations,” 2004,
submitted to ARITH17.