Optimizing floating point arithmetic via post addition shift probabilities A ~ by JANIES A. FIELD University of Waterloo Waterloo, Ontario, Canada INTRODUCTION In many computers floating point arithmetic operations are performed by subprograms: software packages in the case of most small computers, and micro-programmed read -only memories in some larger systems. In such a subprogram there are normally several free choices as to which set of conditions gets a speed advantage. If this advantage is given to the most probable case then there \~ill be an increase in system performance with no increase in cost. One area in which this type of optimization is possible is in the processing of binary floating point addition and subtraction. Here there exist two possible shift operations, first to align the binary points before addition or subtraction, and second, to normalize the result. In processing these shifts there are several options as to method, and sequencing of operations within a given method. To choose the variation that optimizes the program it is necessary to know the probability of occurrence of the various length shifts possible. Sweeneyl has reported experimentally determined distributions for shift lengths in alignment and normalization. Unfortunately the data for normalization was presented as total values. Subprogram optimization requires normalization shift length probabilities given that a specific alignment shift occurred. This paper presents a method for estimating the required probabilities, and an example of their application in subprogram optimization. It will be assumed that in all other bits there is equal probability of a one or zero. Appendix A gives the reasons for this assumption. For purpose of analysis the addition operation can be divided into five cases. These are considered in the following sections. While only the addition operation will be specifically considered, the results are also applicable to subtraction as it is just addition with the sign bit complemented. The following representations for shift length probabilities will be used: P _l-the probability that a one bit right shift is required for normalization. Po-the probability that no nonnalization shift is required. Pi-the probability that an i bit left shift is required for normalization (i > 0). Like signs" equal exponents When the numbers have equal exponents they may be added immediately since no alignment shift is required. With the leading bit of both words being a one, the sum will always contain (n + 1) bits. Thus a one bit right shift will always be required to normalize the result of the addition. Therefore (1) Like signs, unequal exponents Shift length probabilities A common representation of the fractional part of a floating point number is a sign bit plus a normalized n bit true magnitude. This form will be used in the analysis. The form of the exponent is not of concern. In normalized numbers the leading bit is always a one. When the exponents differ the smaller number must be shifted right until the binary points are aligned. Figure 1 shows the situation after the alignment shift of s = (n - m) bits has taken place. The x's indicate bits that may with equal probability be either one or zero. A (n + 1) bit result will occur, requiring a one bit -----------------------------------------597 ---------------------------------------From the collection of the Computer History Museum (www.computerhistory.org) 598 Spring Joint Computer Conference, 1969 Now, with I;-~-li-~"""II-----+I-~-+I-~-+I-:+I--+I +-1:--tl::~: : Pr(C l = 1) = R/2 x-x b=~~======~4=~~----~~2~1~ n n-I m where R = 1 for systems where word B is rounded after alignment, and Figure I-Numbers following binary point alignment right shift for normalization, if and only if there is a carry into bit n. Defining R = 0 for systems where word B is truncated after alignment it follows that Aj = jth bit of word A B; = j th bit of word B (after binary point alignment) 1 ~ . P -1 = Pr(Sn+l = 1) = Pr(C" = 1) = C; = carry into jth position S; = jth bit of unnormalized swn + Then Pr(Cj+l = 1) = = ! + ~ Pr(C [1 - Pr(C j = 1)] ! + ~ Pr(C; = 1 _1_ = 2 - 2 i+l + j = 1) R - 1 ~ ; n > s 2:: 1 (3a) If there is rounding, overflow can occur for an alignment shift of n bits if word A is all ones, hence P -1 R = Pr(Sn+! = 1) = 2,,-1 ;s = n (3b) 1) If the exponents differ by n (n + 1 when rounding) or more then no shifting is required since the larger number is the result. Hence Pr(C1 = 1) ; ---=-2'-:-'---'-m - 1 2:: j 2:: 1 (2) P-1 =0 ;s > n (3c) and, since Bm = 1, Pr(C",+! = 1) = ~ [1 - Pr(Cm = 1)] + Pr(C m = 1) In all cases the only alternative to a one bit right shift is no shift, therefore ;s ~ 1 (3d) = ~ + ~ Pr(C", = 1) Unlike signs, equal exponents If the alignment shift was one bit then Cm+! is Cn. However, for alignment shifts greater than one bit, only if all bits An- 1 through Am+! are ones will a carry 1) th to the nth bit. propagate from the (m + Hence: ;m=n-l Pr(C" = 1) = Pr(Cm+l = 1) 1.-1 = Pr(Cm+l = 1) r Pr(A j = 1) ;m < n- ;m:::;n-1 In Figure 2 is shown a tabulation of all possible combinations of two n bit words with unlike signs. If all bits but the most significant may be one or zero with equal probability it follows that all the combinations listed in Figure 2 are equally probable. Thus to obtain the probability of having exactly i leading zeros after forming the sum requires only that the number of such sums be counted. When the sum is zero no shift is required, while a sum with i leading zeros requires an i bit left shift tor normalization. Hence Po = Pr(zero result) = 1 2-1 From the collection of the Computer History Museum (www.computerhistory.org) (4a) Optimizing Floating Point Arithmetic 1110 -1000 0110 1110 -1001 0101 1110 -1101 0001 I: I: I~ I: 1 n m 1110 -1110 0000 1110 -1111 -0001 1111 -1000 0111 1111 -1001 0110 1111 -1101 0010 1111 -1110 0001 1111 -1111 0000 599 1X 1xX 1Word A WordS X 2 I Figure 3-Numbers follo",;ng binary point alignment and one's complementing of word B (exponents differ by one) and, since Bm = 0) Pr(Cm+l = 1) = ~ Pr(Cm = 1) 1001 -1111 -0110 1001 -1110 -0101 1001 -1101 -0100 1001 -1001 0000 1001 -1000 0001 1000 -1111 -0111 1000 -1110 -0110 1000 -1101 -0101 1000 -1001 -0001 1000 -1000 0000 Figure 2-Array of all possible eombinations of two n-bit normalized numbers with unlike signs and equal exponents (n = 4) (5) Pr(8 n = 1) = Pr(Cm+l = 1) Hence Pr(Sn = 0) =~_l-R 4 and for non-zero results Pi = For the first i bits of the sum to be zero requires that Cn - i +1 and Am be zero, and for i greater than two, that Am-I, Bm- 1, • . • , An-i+l, B,..-i+l also be zero. Thus 2 Pr(exactly i leading zeros) = 22 (n-U 2n- i_ L: Pr(Sn = 8 n - 1 = ... = Sn-i+l = 0) 1 (2 n- 1 - j) = Pr(C n -i+l = 0) Pr(Am = 0) i_2 4 - i - 1 ; i = 2 131 = 2'-1 2ft (1 - 2i+l + 2ft ) ; 1 =s; i < n (4b) m-l = Pr(C n -i+l = 0) Pr(Am = 0) Unlike signs, exponents differ by one i=n-i+l This case requires that the smaller nlunber, after the alignment shift, be subtra(}ted from the larger number. This subtraction may be considered as the addition of the one's complement plus one in the least significant position. The bit alignments are shown in Figure 3. It can be seen that there will be at least one leading zero if Cn = O. Considering the extra one added into the least significant position as a carry yields Pr(CI = 1) = 1 - Rj2 where R is defined as before, and since Equation 2 applies, Pr(C';+1 ~ = 1) 1- R = 2-1 + - .m 21+1' II .. - 1> ]> 1 Pr(A j = 0) Pr(B j = 0) ;2<i:=;n 1 = 22i- 2 (1 - R) 2n + i- 2 An i bit left shift will be required for normalization if there are exactly i leading zeros. Considering exactly one leading zero yields PI = Pr(Sn = 0, Sn-l = 1) = Pr(Sn = 0) - Pr(8 n = Sn-l = 0) _ 1 - 2 From the collection of the Computer History Museum (www.computerhistory.org) (6a) 600 Spring Joint Computer Conference, 1969 B is rounded after the alignment shift, and the last (n - 1) bits of ,,\ord A are zero. and for two or more leading zeros Pi = Pr(S" = ... = Sr.-H1 = 0, = Pr(Sn = ... = Sn-Hl Sn-i = 1) = 0) PI = Pr(Sn = 0) = - Pr(Sn = . . . = Sn-i = 0) __ 3_ ( _1 4 4· )i-l _ 12 - R (6b) ;2si<n n+ i - 1 No shift is required when Sn = 1, or when the result is all zeros. Hence Po = Pr(Sn = 1) + Pr(S7I = ... = 8 1 = 0) R ~ ; s = n 2"-1 (7b) As with the case of like signs, if the exponents differ by at least n (n + 1 with rounding) no shifting will be required as the larger number is the result. Hence PI = 0 ; s > n (7c) As the only alternative to a one bit left shift is no shift (7d) (6c) Application Unlike signs, exponents differ by more than one This case is very similar to the previous one, and can be analyzed by the same method. The bit layout after binary point alignment is shown in ~igure 4. It can be seen that Equation 5 is applicable. Only one leading zero can be produced, since to obtain a leading zero AIt- 1 = 0 and Gn- 1 = 0, and thus 8 11 - 1 = B,,-l = 1. Hence no more than a one bit left shift will be required for normalization. If Cm +1 , or any of An-I, ... , Am +1, is a one then Cn = 1 and 8 = 1. ft In Table I is a tabulation of the probabilities given by Equations 1, 3, 4, 6 and 7 (assuming that 2-n is negligible). As an example of how these probabilities can be used to optimize subprogram operation the addition of numbers with unlike signs will be considered. Table I-Probability Pi of an i-bit normalization shift after an s-bit alignment shift alignment shift s Therefore like signs ...... ft A.n+l = 0) Pr(Cm +1 = 0) 0 1 ~+l 2 3 ; 2 S sS n (7a) A shift of n bits ran produce a leading zero when word I:::I=~:I=====:I=~:I=~:I=: ====:1 ::I::~: : n n-I m :1 1 0 1 4 1 (34 - l-R) = 2n- m - 1 unlike signs __,~I~~~L~I~I~I~_6 PI = Pr(Sn = 0) = Pr(C = 0) = Pr(A,,_l = ... = I 4 5 3 1 1 1 4 4 4 2 3 8 3 16 3 32 3 64 5 8 13 16 29 23 61 5 8 13 16 29 32 61 3 8 3 16 3 32 64 64 64 5 16 3 16 13 64 3 64 29 256 3 256 61 1024 3 1024 3 I I :==1 2 I Figure 4-Numbers following binary point alignment and one'R complementing of word B For computers with ~ "shift-and-count" instruction for normalizing a number and counting the leading zeros a relat.ively st9!nda.rd subprogr9,ffi is shown in Figure .5a. From the collection of the Computer History Museum (www.computerhistory.org) Optimizing Floating Point Arithmetic All normalization is done using the "shift-and-count" instruction. From Table I it is obvious that in many cases a one bit left shift would be enough to normalize the result, with a correspondingly simpler and faster exponent adjustment routine. In most machines, however, there is not a direct test for a single leading zero, and a programmed test loses any speed advantage that use of the one bit shift would gain for this special case. For machines with a fast one bit shift Figure 5b presents an alternative philosophy: try a one bit shift and if unsuccessful proceed with the "shlft-and-count." It is anticipated that enough time ~ill be saved on single leading zero cases to compensate for the loss of time on the multi-leading zero cases. t!n1 vv~ To analyze the relative merits of the normalization schemes of Figure 5a and 5b define r-time to process result with no leading zeros a + bi-time to normalize a number with i leading zeros and process the result c-time to shift left one bit and check if result is normalized d-time to process result after successfully normalizing via a one bit left shift For Figure 5a the average normalization time is 7i Tel = rPo + L: (a + bi)P .. i-I no yes A:ro........ ,only required yes " I ? . . . . . . when rounding done ~~-....,.-..;..--< resu t. ,> in add operation "" no yes (c) (a) (b) Figure 5-Possible subprograms for adding numbers with unlike signs From the collection of the Computer History Museum (www.computerhistory.org) 602 Spring Joint Computer Conference, 1969 while for Figure 5b it is Tb = rP o + (c + d) 11 PI + I: [c + a + b(i - 1)]P i i=2 = Ta + (c - b) (1 - Po) + a) PI (d - It is evident that T b may be greater or less than T a depending on the machine characteristics controlling a, b, c and d. For a floating point addition subroutine (with n = 18) for a PDP-9 computer it was found that a = 15.6, b = 0.4, c = 4 and d = 7. These produce the values for Tb shown in Table II. The second method is best for non-equal exponents but the first method is best for equal exponents. Since the information on whether the exponents were equal is available, the subprogram can be modified to the form shown in Figure 5c. This gives the advantages of Figure 5b to non-equal exponents, but retains the advantages, and improves on, Figure 5a for equal exponents. While an exact measure of the improvement in the normalization is impossible without the knowledge of the alignment shift distribution it would appear as if Figure 5c is about 3% better than Figure 5a. Execution time of subprogram in Figure 5b. be exploited to reduce subprogram time. The table of probabilities allows him to check if the technique devised does yield the expected benefit of a faster program. REFERENCES 1 D W SWEENEY An analysis of floating-point addition IBM Systems Journal Vol 4 No 1 1965 :n-42 2 R WHAMMING Numerical methods for scientists and engineers McGraw-Hill New York 196237 APPENDIX A· Assuming equal probability of one or zero in all bits but the first implies a uniform distribution of numbers. However, Hamming2 indicates that during floating point calculations numbers tend to move towards the lower end of the normalization range. Using the Hamming distribution 1 P(x) = - x In 2 then the probability that bit A 1I - i equals one can be calculated by integrating the probability distribution over the range of numbers for which A lI - i is one. Table II-Execution time of subprogram in Figure 5b. alignment shift o 1 2 or more T] 2 -1 I: i>=O + Ta 1.45 Ta - 1.60 Ta - 5.00 PI II-ilt i 1-1 Pr(A_ i = 1) = 1 -- I-i/2' - 1/21+1 X (2II-1 dx In 2 rl = I -In In2 1 i>=O 2,+1 - 2i ) 21+1 - 2i - 1 2(2i ) (2 i 03 = -In---In2 (2 i+1 !) (2,-10 2 SUMMARY In the example above it is unlikely that the second method would have been considered if the table of probabilities indicating the high incidence of only one leading zero had not been available. Since the final subprogram is an improvement on the second method to eliminate a flaw detected during timing calculations, it is reasonable to assume that the best subprogram would not have been evolved without the use of normalization shift probabilities. It must be remembered, however, that Figure 5c is the best for a particular machine. The only data that is directly applicable to another machine is the table of probabilities. It is still necessary for the designer to deduce a method where specific machine features may Using Stirling's formula In x! = In V2r + (x + ~) In x - x where 0 fJ + I2x < fJ < 1 ;x > 0 the above expression reduces to = ~ + Z-i 4> ; - .541 < 4> < From the collection of the Computer History Museum (www.computerhistory.org) .361 Optimizing Floating Point Arithnletic Thus the probability converges to ~ as j increases. Table III shows the actual Pr(A i = 1). In view of the rapid convergence to a value of ~, and that the maximum deviation from ~ is not large, it seems reasonable to assume that ones and zeros are l1 - 603 equally probahle in all hit positions. Using the actual values from Table III for the first fmv bits would greatly complicate the model without significantly altering the result. Table III-Pr(An - i = 1) assuming Hamming distribution Pr(A1. -i = 1) -------------------------1 0.415 0.456 2 0.478 3 4 0.489 0.494 5 0.497 6 From the collection of the Computer History Museum (www.computerhistory.org) From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2026 Paperzz