Single Precision Reciprocal by Multipartite Table Look-up Peter Kornerup University of Southern Denmark Odense, Denmark David W. Matula Southern Methodist University Dallas, Texas, USA E-mail: [email protected] E-mail: [email protected] Abstract— We develop the foundations for confirming monotonicity of a multi-term reciprocal function approximation. We introduce the concept of operand recoding to improve the accuracy of multipartite approximation. The results are applied to provide a proposed four-partite reciprocal implementation with total table size 27 Kbytes, that yields an IEEE standard, single precision format (24 bit) reciprocal instruction, that is a one-ulp monotonic reciprocal. I. I NTRODUCTION There has been considerable investigation of bipartite and multipartite function approximations in the recent literature [1], [2], [3], [4], [5], [6], [7], [8], [9]. Bipartite reciprocal approximations have been employed for approximate (low precision) reciprocal instructions in commodity microprocessors, targeted at multimedia applications. The question of monotonicity of reciprocal approximations has been discussed in [5], [6]. In this paper we investigate the applicability of the multipartite approach to obtaining an IEEE single precision (24 bit) one-ulp monotonic reciprocal function. Summary: Given a divisor, , we shall show that it is possible, by a multipartite table look-up method, to determine an approximate reciprocal value "!#%$'&()+*,.- /10 203 54603 703 5498': ( 1 * (1) ; <= > ?@A5B/CDE F/G -- or !"#% $+ &7 ( *H , having relative error less than . This is equivalent to having the absolute II error bound: II 54 I I I "!#/$+&J(K'* IIL I (2) single precision In Section II we develop the foundations of monotonic one-ulp reciprocal functions. In particular we introduce and prove a monotonicity theorem. Specifically, if !" 5#%4 $M&(K'* satisfies (2), then the reciprocal function N (K"!#/$ & (K'** , obtained by rounding1 such anJOapproximate reciprocal function PQ position, is a one-ulp monoto nearest at the tonic single precision reciprocal. For single precision division with dividend R STUV , R R R , normalized so that R normalized (rational) exact quotient W let W be the XTUV given by Y 5\ Z[ ] for R L \ W for R_^ ] & For "!#%$ & ()+* satisfying (1) and (2), let W ( R` '* be the normalized binary quotient approximation 49b8':cSXT * & W ( R` +*a W W W determined by Ydd !#/$ & (K'* R for R L ` !#/$ & (K'* & ( +*, Z [d W R` d R for Rfe ` . for R 54 & ( '* L which W R` g Li h Hence g W from Iit follows that g W N ( W & ( I R` '** g L and I ( & ( +**6k 3 * II 3 I K ( j L W W R` . For lnmpo qsr5tvuwtxsyz , the fixed-point round-down {}| l1 , y"~ round-up {} l1 , and round-to-nearest (midpoint down) {} l1 y"~ y"~ roundings, each determine either the -bit unnormalized ~ binary value n s M9 with qft y u y . Specifically we have I 1 I I I l {} y ~ l1 w tv¡ t ty y with similar expressions for {}¢ l1 and {5| y~ y"~ l . For normalized input l£m¤o , the output is normalized, e.g., {} l1 y"~ q 1¥ 9 y or { y"~ l . & W W is a Note that W N ( W ( R` '** * directed breakpoint in the sense that #N( allows W or W $ to be correctly chosen as the precise round-down (or round-up) single precision of R by . Similarly, W result for division & j ( W ( R` '** k is a round-to-nearest break midpoint, allowing #N2( W R * to dictate the correct round-to-nearest single precision division result. Thus the multipartite table lookup procedure described here provides for implementing a oneulp monotonic, single precision reciprocal function, without the need for a multiplier, and for obtaining a single precision division result, employing only two (dependent) single precision multiplications. Our suggested solution is a four-partite table lookup with Kbyte. These methods allow total table size relatively low-power implementations of the SSE paired, single precision reciprocal and division instructions incorporated in current X-86 processors, targeted at low-power multimedia computations. In Section III we review the fundamentals of bipartite table construction, and Section IV introduces the notion of operand partial recoding for constructing multipartite tables. In Section V we present a four-partite look-up table procedure for obtaining a single precision, one-ulp monotonic reciprocal function. II. U LP -ACCURATE M ONOTONIC R ECIPROCAL F UNCTIONS The reciprocal approximation !#/$(K'* - /03 03 0 is termed a -bit one-ulp reciprocal !#/$(K'* g L for all normalized when g ] XT * , and similarly is a -bit binary divisors 4 -ulp reciprocal when g !"#%$()+* g L 4 . ] Observation 1: A -bit one $ reciprocal !"#% $()+* is either the round-up or round-down value XT * of ] for all normalized binary divisors . That is, "!#%$2(K'* j ( * ` ( * for all SXT * ` ] ] with "!#/$(K'* always being the -bit value nearest in the direction of the approximation. Note that a one $ reciprocal is efficiently computable by first obtaining a multiple term 03 Treciprocal approximations 03 008 08': !"#%$+&(K'* , with guard bits 08 0 8+ 08': , that satisfies !#/$ & (K'* . Then the guard bits g] g L are rounded off to obtain the -bit one $ N (K!#/$+&J(K'** . Such one reciprocal "!#%$2(K'* $ reciprocals have applications as a short reciprocal in high radix division algorithms and as the approximate reciprocal function value for a reciprocal instruction implementation. For implementation of a one $ reciprocal function as a reciprocal instruction, it is also desirable to investigate the monotonicity properties of such an approximate function. Rounding Off Guard Bits - Monotonic Reciprocal Instruction: In the remainder of this section we focus on the important reciprocal function application 9 , and where the (exact) inputs 03 T030 ! / # $ K ( * . (approximate) outputs (or "!#/$(K * ) are both -bit normalized values with too large for direct lookup to be practical, e.g. JO . In this case a multi-term computed reciprocal approximation with guard bits rounded off, to provide a one-ulp reciprocal is only guaranteed X! * monotonic for over the subinterval . In particular, it can be shown that the output step size, ! * for a one- $ reciprocal for over , can vary ! T * from 0 to 3 ulps, and over the step size can be down by as much as two ulps, or reverse direction and be up by one $ , contradicting monotonicity. Figure 1(a) illustrates a 5-bit one $ reciprocal #" 7 ( * which systematically chooses the value of ]%$ 7 7 one halfthe pair & j ( ] 7 * ` ( ] 7 *(' that is at least # " 7 ( * g L L g] 7 ulp7 away from ] 7 , i.e., where ] ( * . The step function graph in for Figure 1(a) clearly illustrates that such a perverse, one-ulp reciprocalcan have exaggerated variability ! * in step size over and be non monotonic to ! T * the extent of virtual oscillation over . ! " % # M $ & K ( '* Note that computing - 03 030 0 8 satisfying g "!#%$ & ()+* g L ] 4 results in "!#/$(K * N)9(K!#/$'&(K'** being a 4 - $ reciprocal. Figure 1(b) illustrates for +* 7 a 5-bit 4 - $ reciprocal function ! "#%$() * that 7 chooses the farthest away of & j ( ] 7 * ` 9( ] 7 *,7 ' 7 whenever the farthest yields g ] 7 "!#%$2(K * g L 4 , and otherwise chooses the unique one satisfying the 4 -ulp bound. Lemma 2: For a normalized -bit divisor 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 1 1.2 1.6 1.4 1.8 2 1 1.2 1.6 1.4 (a) Fig. 1. 1.8 2 y y (b) (a): A 5-bit non-monotonic “round-away”, 1-ulp approximation, and (b): A , an -bit 4 - $ reciprocal function XT * is monotonic over the interval , and8strictly X ! V *. monotonic over the portion For Proof: consecutive exact inputs ` k 3 the consecutive reciprocals decrease 3 e . by ] ] 8+ ] ] 8+ Thus exact outputs decrease by at least one-half $ of output. Suppose j ( ] * L ( ] 8+ * . Then the sum of the rounding errors satisfies j) ( ** k (K 9( 8+ * ( ] ] ] ] 8+ * ( j) ( ** e 8+ * k (K ( 8+ * ] ] ] ] 3 . Hence at least one of the rounding errors 4 $ be greater than would , a contradiction. Thus j ( * ^ ( 8+ * holds for all iXT * , ] ] and a 4 -ulp reciprocal function is monotonic (i.e., XT * monotonically non increasing) over . Suppose X ! * . Then 8+ e ] ] 3 3 8+ 3 4 j ( * e . If ] ] ( 8+ * , the sum of rounding errors would ] 4 be greater than , a contradiction. Thus a -ulp 8 monotonically decreasing for reciprocal is strictly V ! V *. In practice the “guarded” computation of a multiterm reciprocal approximation !#/$ & (K)* can often be shown to satisfy a maximum relative error bound XT * for , that is of the same order as the XT * maximum absolute error bound for . Importantly, obtaining just one extra bit of precision -ulp monotonic approximation. in the relative error bound on !#/$ & (K'* before applying the final rounding is now shown sufficient to yield monotonicity. Theorem 3 (Monotonicity Theorem): For a norT , let malized -bit divisor "!#/$+&J(K *_- 03 T00 0 8 be a reciprocal ) strictly less than approximation with relative error for all XT . Then !#/$(K *H N) (K!"#%$+&(K ** is a monotonic, ) one-ulp reciprocal UV . function over !"#%$ & (K'* Proof: A reciprocal approximation with relative error strictly less than satisfies "!#/$ & () * . So then N(K"!#/$ & (K * * g ] g L ] XTf 3 is a one- $ reciprocal for N) (K!"#%$ & (K)** satisfying g ] . g L ( k ] * (Note that this bound scales down towards 4 - $ S as ). For successive -bit normalized divisors XT ` k difference of their recip, the 8+ 8+ ^ rocals satisfies ] ] ] ] . Assume that N ()"!#%$ & () ** j ( ] * L ] ** ( 8+ * ¤N9(K!"#%$ & (Ksk . Then the ] successive reciprocal rounding errors sum of these N) (K!"#%$ & (K)** k (KN ()!"#%$ & (K¡k ( satisfies ] ** 8+ * ^ ( 8+ *,k 3 ^ ] ] ] k . So at least one of these rounding errors ] ( k * is greater than or equal to , a con] & N ) ( "! % # $ ) ( * * N K ( "! #/$ & (K2k tradiction. Thus ^ ** ! / # $ K ( * ) N K ( ! " % # + $ & K ( * * , and is mono- STUV tonic for . III. B IPARTITE TABLES The bipartite table lookup process for determining an approximate reciprocal of a normalized binary 19 , comprises the use of two divisor distinct binary direct lookup tables of comparable size. These tables are concurrently addressed by distinct, equivalent length substrings of divisor bits, with each table fashioned to provide a distinct part of a carry save or borrow save representation of the approximate reciprocal. Specifically, our bipartite reciprocal approximations are of the form !"#%$()+*a¤!"#%$ (K *2kV!#/$ ()+* , the primary approxWith (K * ! " % # $ imation is determined by the ` 1 -bit index . The secondary approximation term is determined by some leading bits and 8T 8+ some supplementary trailing bits . The ()+* "! % # $ approximation may be fashioned so that is exclusively positive or negative, with magnitudes less than a unit, or sign-symmetric with magnitude less that half a unit. Partitioning the operand into 2 illustrates the lookthree equal -bit parts, Figure up process, employing leading bits of a higher precision divisor. 8 8 T /8s : precision, at a cost of only twice the table size, compared to a single direct lookup table. For use as a seed or short reciprocal in application to division algorithms, the redundant reciprocal approximation may be sent directly to an appropriate multiplier recoder. For reciprocal function output in standard binary form the two outputs require a supplementary carry-completion addition. Determination of the entries in a bipartite lookup table pair is guided by the following exact expansions particular to the reciprocal function: Theorem 4 (Bipartite Reciprocal Identities): For ¤} k the normalized binary divisor partition T 8 8+ where v and S - the 1 can be expanded to the sum of reciprocal ] a primary term, determined by , and a secondary term of magnitude less than , according to any of the following (borrow save expansion) (3) H k (carry save) (4) k 6()3k * k 8 8 (midpoint) (5) k 6() k * Proof: Putting the primary and secondary terms over a common denominator yields an imme diate reduction For borrow ] verifying ]T each ]identity. save ]9] ]] ]9] ]] ] , and similarly for the carry save and midpoint expansion identities. Table 1 Table 2 !#/$ ()+* !#/$ () * Recoder/ Adder "!#%$2(K'* Fig. 2. The Bipartite Table Look-up Method The compelling advantage of the two table bi partite lookup process is, that it provides a sim ple procedure to achieve essentially times the The claim that bipartite reciprocal approximations derived from (3) to (5) can have precision bitsis supported by the following observations. Let and consider the input bits in Figure 2. A primary table employing the bit index with bits of output, allows the primary term to be approximated with error less than half a unit in the k place. A secondary table 19 a 8 8+ uses index , formed by concatenating leading fraction bits of with leading bits of . This table can provide a ( k * -bit output value for !"#%$ ()+* , allowing the secondary term to be approximated to near the order of a unit in the place. These arguments will now be made precise, leading to the specification of formulas for direct lookup table entries that minimize the maximum absolute errors in each of the terms in expansions (3) to (5). For bipartite expansions it is most convenient to fix a common last place position for both terms, and minimize the maximum absolute error contributed by each term. Note that the primary terms in each of the identities have exact inputs. Their evaluation can provide entries to -bits-in -bits-out direct lookup tables 8+an absolute error for each entry bounded by with 3 , due only to rounding the exact output to the output table size. E.g., for the borrow save k expansion with , let the output size be bits where ^ - is a small number of guard bits. , the primary term Excluding the special case k '* approximation in the -bits-in ( -bits-out table is !"#%$ () *,.N 8':8 ] ¤- /10 T0303 8':8T The primary table size for !"#%$ (K * is 8':8+( k '* bits with maximum absolute error . "!#/$ (K'* The secondary term approximation borrow save reciprocal expansion (3) ]9] for the with , will be determined from the leading fraction bits of , where along with the leading bits of with 8 8+ (K'* ! / # $ . Thus we have !"#%$ () ` * . In terms of the arguments () ` * we note the following bounds on each of the factors of the secondary term ]] . L k k S L k k V k k L with Lemma 5: For the divisor O 9 9 , let , , ^ 8 8+ T 8T 8+ i and . Then the borrow save expansion secondary term satisfies the following tight bounds which are tight in the sense that ]9] can be arbitrarily close to either bound: (K k *() k V k L L 1 k () k k * 3 * (6) Let m () ` * be the midpoint of the interval determined by (6). The value m (K ` * minimizes the maximum absolute error for approximation of separate regions determined 9] ] in each of the f 8 8+ by each index . The maximum error over all the regions will occur for the argument pair with index ` leading to Corollary 6: Let (K ` * be the midpoint of the interval determined by (6). Then I I I I I I * I I ) ( I I L ` The secondary term in our bipartite borrow save approximation is then determined by rounding m (K ` * to the last place position k k , !"#%$ () ` * N 8':8 ( () ` ** Including the two rounding errors we further obtain the following from Corollary 6. Corollary 7: The borrow save bipartite reciprocal approximation for the normalized divisor V T given by "!#/$(K'*a.N 8':8 N 8':8 ( () ` ** ] satisfies the bound I absolute error I I I I I k ! / # $ K ( '* L ] For the maximum error is then (O k * with total table size . With just a few guard bits we approach the case for where "!#/$(K'*a ] (K ` * with g ] !"#%$ ()+* g L 4 . It can be shown that the maximum rela , tive error for such a bipartite table occurs for ( * so the precision is at least -bits. In practice bipartite tables arefound most effective for total index lengths , where each part has between and bits. Considering variable sized parts the preferred partitions of index parts are * * ( k * g g ` ( k g g , and ( k g g . Exploiting Symmetry in Bipartite Tables: The midpoint expansion (5) allows for design of a sign symmetric bipartite table process, providing one additional bit of accuracy. For the symmetric case some of the input bits and the secondary term approximation are subject to a conditional complementation. When the approximate reciprocal type is a reciprocal function defined on ’exact’ input points, the midpoint expansion (5) can be modified to yield a symmetric secondary term. Symmetric Bipartite Reciprocal1Functions: For T , the the normalized -bit divisor k has secondary part of the partition - 89 8+ with - ` U . secondary The part can be centered by subtracting ( ¤ * and adding the same to the primary part. The symmetric divisor partition for the -bit 19 is then normalized divisor 8 * ¤ £ k k( k (7) From (7) for any precision the symmetric bipartite identity for ] is then HV ( k * 8 k 8 8 V * k 6() k (8) Then the symmetric bipartite reciprocal function k * -bit normalized binary divisor for the ( is determined from (8) with k and by ( * k 8 S 8 k V * (K k 8+ p ( k *, Here is determined so that ( * where 8+ 8+ 8 8+ 8+2 8T 8 T 8+ TT 8+ 8 8 This allows the bounds () k * L 6() k 8 V * L (9) where the interval midpoint (K ` * from (9) is used to determine the second term of the symmetric bipartite approximate reciprocal function. The centering of the secondary part in (7) and (8) thus provides for a sharp result, since is exact, and shares the practical convenience of determining by a ’s complement. IV. M ULTIPARTITE TABLE L OOK - UP The bipartite table lookup process for determining an approximate reciprocal can be expanded to a tripartite or multipartite process. The result is then the sum of three or more terms obtained from three of more table lookups indexed by comparably sized indices. Tripartite tables in principle should achieve 4 times the precision and cost about times about the table size as a single direct lookup table. In practice multipartite tables are arguably most effective for tables with total input index lengths and resulting output approximation precisions in the JO range * to bits. This range can be covered employing three to four-term sums with primary table indices bounded by eleven bits. These practical bounds keep total table size moderate. They also allow table lookup and subsequent addition time to be kept small. For practical primary table indices of size at most s bits, the marginal improvement in tripartite and four-partite table approximations for each additional part is only 2-3 bits per part. For these index ranges the multipartite process is conveniently visualized by recognizing the divisor partition as a partial recoding operation. Exploiting Recoding in Multipartite Tables: 1 8 - Definition Let with 8 8 ' 8: k . Then for ^ , and ^ ` O * a -digit partial recoding (Booth radix ) of ( denotes the expansion 4 8 k k T k k k V ( HV * ( the tail satisfying with U * and . & ` ` - ` ` ' for of the tail Note that the condition on the range 3 * for 8 ( 8 * ( 1 ¤ makes the expansion unique. In practice the digits are determined from the T 8 concurrently as in standard bit triples 8+ 8+ Booth recodings. The tail is determined from conditionally complementing T 8+ 8+ T 8 the bits depending on bit as described for symmetric bipartite expansions. The notion of partial recoding is extendable to Booth radix recodings in the obvious way. 8 bipartite midThe divisor (input) partition for the *k ( * point expansion (5) is ()k . This provides the basis for multipartite (output) expansions by partial recoding of the secondary term of (5). Observation 9: Let the normalized binary divisor XT * 8 the recoded 8+ 8+ partition have tripartite f () k * k k with T U - ' L . ` & ` ` ` ` , and The primary table can provide suitably rounded + 8 + 8 + 8 . ` ] values for ] and ] k The latter two values are sent to Booth radix PPG’s with input digits and , providing the selected terms for !"#%$ ()+* and !#/$ (K'* . The final term is provided by a terminal term table with 1 H 8 8 8Mb by substituting Proof: The result is obtained as previously index * k 3 ( the partial recoding into the described for the recoded tripartite expansion. bipartite midpoint expansion (5). V. A S INGLE P RECISION , M ONOTONIC From (10) with we obtain a recoded triparU LP -ACCURATE R ECIPROCAL F UNCTION tite expansion for use as a seed or short reciprocal, USXTa Let . We split our !"#%$()+*,¤N 8':8+ 8 k reciprocal function into two cases, corresponding to 8':8+ 8+ two sub-intervals: j 8':8 8 k T (K k * 8+ Case 1: k_( * k( * !"#%$ ( 9 H 8'4 8+ * Let have the partition with K TK 1T T sign-symmetric fractional part K 5 4 The -bit index can retrieve both a TT1 k V ( k k '* -bit output for N 8+ 8': ( 8+ * and and . Employing ] k k +* 8':8 ( s* ( N + 8 ] ] ] a -bit output for . The the symmetric bipartite identity ] ] iteratively, we obtain second output can be conditionally complemented * 54 ( and/or shifted to determine !#/$ (K'* as an approx ( k * + 8 imation of satisfying ] ] (K * (K * I I I I 8+ I I * I !#/$ ()+* I 8 Defining our constant term ( by I I 6() k * adding half the maximum error 4 8+ 8+ 8': k L HS 7 k (K * The approximation forP the terminal term 8+ is handled as !"#%$ ()+* 7 ] ] k ] ] , we obtain that ] ] for the secondary term in the symmetric where to a smaller order g Mg L . bipartite expansion, employing the bit string 8'4 8+7 8+ be partially recoded with two Booth Let as the index to a 8 digits and ab symmetric tail, then separate terminal term table. The recoded tripartite radix ¡O O ' O k 3 k with ` & ` ` ` expansion here employs an intermediate Booth 4 digit in the tripartite divisor partial recoding, to and g g , where 8 ( * obtain a bit enhancement of the precision of the , ( * result, compared to symmetric bipartite reciprocal 54 S b 9 Tb k approximation. Letting , it can be Analogous to Observation 9 we can employ a shown that O recoded -part divisor partition including two inb O k termediate Booth radix digits and obtain a -part (K * (K b * (K b * identity 8+ with g g L . Then it can be shown that 8 8 k 6() k * b )4 54 k 8 8 Mb (K b * () * (K * (K * 8 8 (K k * 6() k * (11) Then 8+ 8 8 6() k * 8+ 8 (10) 6() k * 1 S Tv_ L with g g . Since for , we 1.1 Tb K 1KT)4 K7 1b 1 T can use a 10-bit index for determining simultane 3 3 5 4 1 6 ously , ] and ] , and another 11-bit index P determines ] , all with sufficient guard bits. The first four terms of (11) then provide a four-partite Table 1 recode recode reciprocal approximation to 54 ] , with error bound arbitrarily close to ] ] . Table 2 Using 4 guard bits so that each of the four terms 7 54 contributes a table based rounding error of at most ulps, where here $ , we obtain a four"!#%$2(K'* satisfying I I approximation MG MG partite reciprocal 54 I I I I & "! / # $ K ( '* k L I ] I .Then 7 ] ] 54 I I I k N 54 (K!#/$ & (K'** I L . ] ] ] 4-to-2 Adder 5 4 (red.) monotonic Claim 1: N ()"!#%$ & ()+** is a one-ulp UV reciprocal function over the interval . 54 . Claim 2: If N ()"!#%$ & ()+** is not monotonic then Fig. 3. Four-partite table reciprocal look-up for the interval at one rounded reciprocal has an error at least least 8+ ulps. k ] in the 4-to-2 adder, maintaining guard bits, to a Claim 2 can be verified by an argument similar redundant reciprocal in the range , including to that of the proof of the Monotonicity Theorem two leading guard digits [10]. (Theorem 3). Consider that the maximal 7 total roundFor output as a single precision reciprocal the k ] ulps. ing error is essentially bounded by redundant result must be compressed by a carry V completing adder with rounding and normalization. the interval , Now ] ] e 7 over For use as a divisor reciprocal, the result is recoded k ] . Therefore the error bound so 7 k ] e for multiplication by the single precision dividend, of k ] ulps is sufficient to guarantee that no to obtain a quotient breakpoint by adaptively round k error after rounding is as large as ] ulps. ing with respect to the rounding mode (see Sec 54 ()"!#%$ &()+** It follows that is monotonic for tion I). TcV V . 7 Case 2: Since k ] L also verifies a one-ulp bound ( TK * for TcS For this region we use 11 bits for , the four-partite approxima 54 K primary table Let have the partition index. tion N ()!"#%$ & (K'** , with !#/$ & ()+** given by the the K k( * K withK 1 5K4 )4 s and four terms in (11), is a single precision, one-ulp 3 * k k ( K TcV . Proceeding as monotonic reciprocal function over . , ] ] b Figure 3 illustrates an implementation of this in Case 1, the quadratic term is now yielding an error term ] ] with g Mg L , four-partite reciprocal function. Table 1 receives the T T 1 s( TT1K * after centering by adjustment of . The 10-bit index and outputs , ] O and , with table values , and all rounded terminal term now satisfies b to position . k ` (K K * (K * (K * The terms and are each input to both multiple generators, MG, where MG functions identically P to a Booth radix-8 PPG. Table 2 receives the 11-bit where ] still can be determined from an 11-bit P Tb T T index for determining ] , index , since there is one less rounded to position . The sum is compressed trailing bit, and one more leading bit than in Case 1. We then obtain [4] J.-M. Muller, “A Few Results on Table-Based Methods,” K7 Reliable Computing, vol. 5, no. 3, pp. 279–288, 1999. k [5] C. Iordache and D. Matula, “Analysis of Reciprocal () * (K K * (K K * (K K * and Square Root Reciprocal Instructions in the AMD (12) K6-2 Implementation of 3DNow,” Electronic Notes in S with g g L for . Theoretical Computer Science, vol. 24, 1999. Then the four-partite approximation "!#%$ & ()+* , [6] F. de Dinechin and A. Tisserand, “Some Improvements on Multipartite Table Methods,” in Proc. 15th IEEE formed from the first four terms of (12) by roundSymposium on Computer Arithmetic. IEEE, 2001, pp. ing the table entries with four guard bits, will 128–135. have a maximum error bound of ulps for [7] W. Wong and E. Goto, “Fast Evaluation of the Elemen V 54 . Then N (K"!#/$ & ()+** is a one-ulp tary Functions in Single Precision,” IEEE Transactions V3 on Computers, vol. 44, no. 3, pp. 453–457, 1995. monotonic reciprocal function for . [8] J. Pineiro, J. Bruguera, and J.-M. Muller, “Faithful Pow Figure 4 illustrates the look-up table structure for ering Computation using Table Look-Up and a Fused * implementing this reciprocal function over . Multiplication Tree,” in Proc. 15th IEEE Symposium on The tables of Figures 3 and 4 have combined size Computer Arithmetic. IEEE, 2001, pp. 40–47. totalling less than 27 Kbytes, and the two structures [9] F. de Dinechin and J. Detrey, “Multipartite Tables in JBits for the Evaluation of Functions on FPGA’s,” in IEEE Recan share much of the hardware shown, using suitconfigurable Architecture Workshop, International Paralably placed multiplexers. 1.0 K KT)4K7 b 7 4 3 3 recode recode Table 1 4 Table 2 1 MG MG 4-to-2 Adder Fig. 4. (red.) Four-partite table reciprocal look-up for the interval . R EFERENCES [1] D. DasSarma and D. Matula, “Faithful Bipartite ROM Reciprocal Tables,” in Proc. 12th IEEE Symposium on Computer Arithmetic. IEEE Computer Society, 1995, pp. 17–28. [2] H. Hassler and N. Takagi, “Function Evaluation by Table Look-Up and Addition,” in Proc. 12th IEEE Symposium on Computer Arithmetic. IEEE, 1995, pp. 10–16. [3] M. Schulte and J. Stine, “Approximating Elementary Functions with Symmetric Bipartite Tables,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 842–847, 1999. lel and Distributed Symposium, Fort Lauderdale, Florida. IEEE, April 2002. [10] P. Kornerup and J.-M. Muller, “Leading Guard Digits in Finite Precision Redundant Representations,” 2004, submitted to ARITH17.
© Copyright 2026 Paperzz