Lecture Topics Today: Floating Point Arithmetic (P&H 3.5-3.6) Next: continued 1 Announcements Milestone #8 (due 4/6) Milestone #9 (due 4/13) 2 1 Floating Point Numbers Non-integral values, including very small and very large values Similar to scientific notation: • -2.34 × 1056 • +6.05 × • -0.002 × 10–4 • +987.02 × normalized 10–12 not normalized 109 3 Floating Point Numbers In mathematics, the real numbers are infinite and continuous. In computing, the floating point numbers are finite and discrete because there is a fixed number of bits to represent values. N bits ==> 2N bit patterns Floating point values are a subset of the real numbers. 4 2 IEEE-754 Floating Point Formats Developed in response to divergence of existing representations (portability issues for scientific code) Now almost universally adopted Two representations • • Single precision (32 bits) Double precision (64 bits) 5 IEEE-754 Floating Point Formats Single precision: 32 bits (4 bytes) Double precision: 64 bits (8 bytes) 6 3 value ----+inf +99.5 +15.0 +10.75 +10.5 +10.0 +3.3 +3.25 +3.0 +2.1 +1.5 +1.25 +1.0 +0.5 +0.25 +0.0 -0.0 double precision ---------------7FF0000000000000 4058E00000000000 402E000000000000 4025800000000000 4025000000000000 4024000000000000 400A666666666666 400A000000000000 4008000000000000 4000CCCCCCCCCCCD 3FF8000000000000 3FF4000000000000 3FF0000000000000 3FE0000000000000 3FD0000000000000 0000000000000000 8000000000000000 single precision ---------------7F800000 42C70000 41700000 412C0000 41280000 41200000 40533333 40500000 40400000 40066666 3FC00000 3FA00000 3F800000 3F000000 3E800000 00000000 80000000 7 Double Precision Format: sign of fraction 1 bit biased exponent 11 bits fraction 52 bits Interpretation: +/- 1.fraction x 2(biased exponent – 0x3FF) 8 4 Double Precision Interpretation: +/- 1.fraction x 2(biased exponent – 0x3FF) sign: 0 for positive, 1 for negative significand: 1.fraction true exponent: biased exponent – 0x3FF 9 Decoding DP Values The decoding process: Partition the 64 bits into the three fields Calculate the true exponent Use the sign, significand and true exponent to calculate the value 10 5 Internal rep: 3FFE000000000000 base 16 0011 1111 1111 1110 0000 0000 .... 0000 0000 sign bit: 0 (positive) biased exponent: 011 1111 1111 (0x3FF) true exponent: 0x3FF – 0x3FF = 0 significand: 1.111000000000....00000000 value: +1.111 x 20 base 2 = +1.111 base 2 = +1.875 base 10 11 Internal rep: C019000000000000 base 16 1100 0000 0001 1001 0000 0000 .... 0000 0000 sign bit: 1 (negative) biased exponent: 100 0000 0001 (0x401) true exponent: 0x401 – 0x3FF = 2 significand: 1.100100000000....00000000 value: -1.1001 x 22 base 2 = -110.01 base 2 = -6.25 base 10 12 6 Internal rep: BFC0000000000000 base 16 1011 1111 1100 0000 0000 0000 .... 0000 0000 sign bit: 1 (negative) biased exponent: 011 1111 1100 (0x3FC) true exponent: 0x3FC – 0x3FF = -3 significand: 1.000000000000....00000000 value: -1.0 x 2-3 base 2 = -0.001 base 2 = -0.125 base 10 13 Encoding DP Values The encoding process: Express the value in binary, then normalize it Calculate the biased exponent Pack the sign, fraction and biased exponent into the three fields 14 7 Value: +1.25 base 10 = +1.01 base 2 = +1.01 x 20 base 2 significand: 1.010000000000....00000000 biased exponent: 0x0 + 0x3FF = 0x3FF = 01111111111 sign bit: 0 (positive) 0011 1111 1111 0100 0000 0000 .... 0000 0000 Internal rep: 3FF4000000000000 base 16 15 Value: -10.75 base 10 = -1010.11 base 2 = -1.01011 x 23 base 2 significand: 1.010110000000....00000000 biased exponent: 0x3 + 0x3FF = 0x402 = 10000000010 sign bit: 1 (negative) 1100 0000 0010 0101 1000 0000 .... 0000 0000 Internal rep: C025800000000000 base 16 16 8 Value: +0.0625 base 10 = +0.0001 base 2 = +1.0 x 2-4 base 2 significand: 1.000000000000....00000000 biased exponent: -0x4 + 0x3FF = 0x3FB = 01111111011 sign bit: 0 (positive) 0011 1111 1011 0000 0000 0000 .... 0000 0000 Internal rep: 3FB0000000000000 base 16 17 Special Values Four categories of special values: Infinity Not-A-Number Zero Denormal Values All other values called “normalized” 18 9 Biased exponent field all ones Infinity – fraction field all zeroes 7FF0000000000000 NaN – fraction field not all zeroes 7FF0000000000001 Sign bit can be 1 (for negative); set of 2 different representations of NaN 53 19 Biased exponent field all zeroes Zero – fraction field all zeroes 0000000000000000 Denormal – fraction field not all zeroes 0000000000000001 Sign bit can be 1 (for negative); set of 2 different denormal values 53 20 10 Range of DP exponents: 111 1111 1111 111 1111 1110 ... 100 0000 0001 100 0000 0000 011 1111 1111 011 1111 1110 011 1111 1101 ... 000 0000 0001 000 0000 0000 Special (Infinity, NaNs) Largest (+0x3FF or +1023) +2 +1 +0 -1 -2 Smallest (-0x3FE or -1022) Special (Zero, Denormals) 21 Denormal (Subnormal) Values Denormal values are smaller than the minimal normalized value – they allow us to have values which are close to zero. Normalized: 1.fraction x 2 Denormal: 0.fraction x 2 (Biased-0x3FF) -0x3FE All denormal values have the same true exponent (-0x3FE or -1022) 22 11 The smallest normalized value: 0000000000010000....0000 Value: 1.0 x 2 -1022 The largest denormal value: 0000000000001111....1111 Value: 0.1111….1111 x 2 -1022 23 The second largest denormal value: 0000000000001111....1110 Value: 0.1111….1110 x 2 -1022 The smallest denormal value: 0000000000000000....0001 Value: 0.0000….0001 x 2 -1022 24 12 Single Precision Format: sign of fraction 1 bit biased exponent 8 bits fraction 23 bits Interpretation: +/- 1.fraction x 2(biased exponent – 0x7F) 25 Single Precision Interpretation: +/- 1.fraction x 2(biased exponent – 0x7F) sign: 0 for positive, 1 for negative significand: 1.fraction true exponent: biased exponent – 0x7F 26 13 Range of SP exponents: 1111 1111 1111 1110 1000 1000 0111 0111 0111 0001 0000 1111 1110 1101 0000 0001 0000 0000 Special (Infinity, NaNs) Largest (+0x7F or +127) ... +2 +1 +0 -1 -2 ... Smallest (-0x7E or -126) Special (Zero, Denormals) 27 Decoding and Encoding SP Values The decoding process is the same as for DP values, except fields are 1, 8 and 23 bits The encoding process: sizes of fields Note: biased exponent field of 8 bits determines biasing factor: 01111111 = 0x7F 28 14 Internal rep: 40B80000 base 16 0100 0000 1011 1000 0000 0000 0000 0000 sign bit: 0 (positive) biased exponent: 1000 0001 (0x81) true exponent: 0x81 – 0x7F = 2 significand: 1.01110000000000000000000 value: +1.0111 x 22 base 2 = +101.11 base 2 = +5.75 base 10 29 Internal rep: BFC00000 base 16 1011 1111 1100 0000 0000 0000 0000 0000 sign bit: 1 (negative) biased exponent: 0111 1111 (0x7F) true exponent: 0x7F – 0x7F = 0 significand: 1.10000000000000000000000 value: -1.1 x 20 base 2 = -1.1 base 2 = -1.5 base 10 30 15 Value: +10.75 base 10 = +1010.11 base 2 = +1.01011 x 23 base 2 significand: 1.010110000000....00000000 biased exponent: 0x3 + 0x7F = 0x82 = 10000010 sign bit: 0 (positive) 0100 0001 0010 1100 0000 0000 0000 0000 Internal rep: 412C0000 base 16 31 Value: -99.5 base 10 = -1100011.1 base 2 = -1.1000111 x 26 base 2 significand: 1.10001110000000000000000 biased exponent: 0x6 + 0x7F = 0x85 = 10000101 sign bit: 1 (negative) 1100 0010 1100 0111 0000 0000 0000 0000 Internal rep: C2C70000 base 16 32 16 6-bit Precision Assume only 6 bits available (artificial). Format: sign of fraction biased exponent fraction 1 bit 2 bits 3 bits Interpretation: +/- 1.fraction x 2(biased exponent – 0x1) 33 Range of exponents: 11 10 01 00 Special (Infinity, NaNs) Largest (+0x1 or +1) Smallest (+0x0 or +0) Special (Zero, Denormals) Set of positive and negative values symmetric (just like SP and DP formats) Clearly artificial, but able to enumerate all possibilities 34 17 Infinity and NaNs (8 bit patterns) encoding -------011111 011110 011101 011100 011011 011010 011001 011000 binary value ------------------- decimal value ------------NaN NaN NaN NaN NaN NaN NaN Infinity 35 True Exponent of 1 (8 bit patterns) encoding -------010111 010110 010101 010100 010011 010010 010001 010000 binary value ------------------1.111e+1 (11.11) 1.110e+1 (11.10) 1.101e+1 (11.01) 1.100e+1 (11.00) 1.011e+1 (10.11) 1.010e+1 (10.10) 1.001e+1 (10.01) 1.000e+1 (10.00) decimal value ------------3.75 3.50 3.25 3.00 2.75 2.50 2.25 2.00 36 18 True Exponent of 0 (8 bit patterns) encoding -------001111 001110 001101 001100 001011 001010 001001 001000 binary value ------------------1.111e+0 (1.111) 1.110e+0 (1.110) 1.101e+0 (1.101) 1.100e+0 (1.100) 1.011e+0 (1.011) 1.010e+0 (1.010) 1.001e+0 (1.001) 1.000e+0 (1.000) decimal value ------------1.875 1.750 1.625 1.500 1.375 1.250 1.125 1.000 37 Zero and Denormals (8 bit patterns) encoding -------000111 000110 000101 000100 000011 000010 000001 000000 binary value ------------------0.111e+0 (0.111) 0.110e+0 (0.110) 0.101e+0 (0.101) 0.100e+0 (0.100) 0.011e+0 (0.011) 0.010e+0 (0.010) 0.001e+0 (0.001) decimal value ------------0.875 0.750 0.625 0.500 0.375 0.250 0.125 Zero 38 19 encoding -------011111 .. 011000 010111 .. 010000 001111 .. 001000 000111 .. 000000 binary value ------------------- 1.111e+1 (11.11) 1.000e+1 1.111e+0 (10.00) (1.111) 1.000e+0 0.111e+0 (1.000) (0.111) decimal value ------------NaN Infinity 3.75 (0.25) 2.00 1.875 (0.125) 1.000 0.875 (0.125) Zero 39 Summary: Floating Point Representation N N bits ==> 2 values Sign and magnitude system: half of values negative single: 8 bits double: 11 bits S Exponent single: 23 bits double: 52 bits Fraction x ( 1)S (1.Fraction) 2(Exponent Bias) 40 20 Summary: Floating Point Representation Infinity and NaNs: biased exponent field all 1’s Zero and Denormals: biased exponent field all 0’s 41 Summary: Floating Point Representation Only sums of powers of 2 are representable 5 3 -1 -3 -7 ex: 2 + 2 + 2 + 2 + 2 + 2 -12 Gap between adjacent values increases as you move away from Zero 42 21
© Copyright 2026 Paperzz