Lecture Topics Announcements

Lecture Topics

Today: Floating Point Arithmetic
(P&H 3.5-3.6)

Next: continued
1
Announcements

Milestone #8 (due 4/6)

Milestone #9 (due 4/13)
2
1
Floating Point Numbers

Non-integral values, including very small
and very large values

Similar to scientific notation:
•
-2.34 × 1056
•
+6.05 ×
•
-0.002 × 10–4
•
+987.02 ×
normalized
10–12
not normalized
109
3
Floating Point Numbers

In mathematics, the real numbers are
infinite and continuous.

In computing, the floating point numbers
are finite and discrete because there is a
fixed number of bits to represent values.
N bits ==> 2N bit patterns

Floating point values are a subset of the
real numbers.
4
2
IEEE-754 Floating Point Formats

Developed in response to divergence of
existing representations (portability issues
for scientific code)

Now almost universally adopted

Two representations
•
•
Single precision (32 bits)
Double precision (64 bits)
5
IEEE-754 Floating Point Formats
Single precision: 32 bits (4 bytes)
Double precision: 64 bits (8 bytes)
6
3
value
----+inf
+99.5
+15.0
+10.75
+10.5
+10.0
+3.3
+3.25
+3.0
+2.1
+1.5
+1.25
+1.0
+0.5
+0.25
+0.0
-0.0
double precision
---------------7FF0000000000000
4058E00000000000
402E000000000000
4025800000000000
4025000000000000
4024000000000000
400A666666666666
400A000000000000
4008000000000000
4000CCCCCCCCCCCD
3FF8000000000000
3FF4000000000000
3FF0000000000000
3FE0000000000000
3FD0000000000000
0000000000000000
8000000000000000
single precision
---------------7F800000
42C70000
41700000
412C0000
41280000
41200000
40533333
40500000
40400000
40066666
3FC00000
3FA00000
3F800000
3F000000
3E800000
00000000
80000000
7
Double Precision
Format:
sign of fraction
1 bit
biased exponent 11 bits
fraction
52 bits
Interpretation:
+/- 1.fraction x 2(biased exponent – 0x3FF)
8
4
Double Precision
Interpretation:
+/- 1.fraction x 2(biased exponent – 0x3FF)
sign: 0 for positive, 1 for negative
significand: 1.fraction
true exponent: biased exponent – 0x3FF
9
Decoding DP Values
The decoding process:

Partition the 64 bits into the three fields

Calculate the true exponent

Use the sign, significand and true
exponent to calculate the value
10
5
Internal rep: 3FFE000000000000 base 16
0011 1111 1111 1110 0000 0000 .... 0000 0000
sign bit: 0 (positive)
biased exponent: 011 1111 1111 (0x3FF)
true exponent: 0x3FF – 0x3FF = 0
significand: 1.111000000000....00000000
value: +1.111 x 20 base 2
= +1.111 base 2
= +1.875 base 10
11
Internal rep: C019000000000000 base 16
1100 0000 0001 1001 0000 0000 .... 0000 0000
sign bit: 1 (negative)
biased exponent: 100 0000 0001 (0x401)
true exponent: 0x401 – 0x3FF = 2
significand: 1.100100000000....00000000
value: -1.1001 x 22 base 2
= -110.01 base 2
= -6.25 base 10
12
6
Internal rep: BFC0000000000000 base 16
1011 1111 1100 0000 0000 0000 .... 0000 0000
sign bit: 1 (negative)
biased exponent: 011 1111 1100 (0x3FC)
true exponent: 0x3FC – 0x3FF = -3
significand: 1.000000000000....00000000
value: -1.0 x 2-3 base 2
= -0.001 base 2
= -0.125 base 10
13
Encoding DP Values
The encoding process:

Express the value in binary, then
normalize it

Calculate the biased exponent

Pack the sign, fraction and biased
exponent into the three fields
14
7
Value: +1.25 base 10
= +1.01 base 2
= +1.01 x 20 base 2
significand: 1.010000000000....00000000
biased exponent: 0x0 + 0x3FF = 0x3FF
= 01111111111
sign bit: 0 (positive)
0011 1111 1111 0100 0000 0000 .... 0000 0000
Internal rep: 3FF4000000000000 base 16
15
Value: -10.75 base 10
= -1010.11 base 2
= -1.01011 x 23 base 2
significand: 1.010110000000....00000000
biased exponent: 0x3 + 0x3FF = 0x402
= 10000000010
sign bit: 1 (negative)
1100 0000 0010 0101 1000 0000 .... 0000 0000
Internal rep: C025800000000000 base 16
16
8
Value: +0.0625 base 10
= +0.0001 base 2
= +1.0 x 2-4 base 2
significand: 1.000000000000....00000000
biased exponent: -0x4 + 0x3FF = 0x3FB
= 01111111011
sign bit: 0 (positive)
0011 1111 1011 0000 0000 0000 .... 0000 0000
Internal rep: 3FB0000000000000 base 16
17
Special Values
Four categories of special values:




Infinity
Not-A-Number
Zero
Denormal Values
All other values called “normalized”
18
9
Biased exponent field all ones

Infinity – fraction field all zeroes
7FF0000000000000

NaN – fraction field not all zeroes
7FF0000000000001
Sign bit can be 1 (for negative); set of 2
different representations of NaN
53
19
Biased exponent field all zeroes

Zero – fraction field all zeroes
0000000000000000

Denormal – fraction field not all zeroes
0000000000000001
Sign bit can be 1 (for negative); set of 2
different denormal values
53
20
10
Range of DP exponents:
111 1111 1111
111 1111 1110
...
100 0000 0001
100 0000 0000
011 1111 1111
011 1111 1110
011 1111 1101
...
000 0000 0001
000 0000 0000
Special (Infinity, NaNs)
Largest (+0x3FF or +1023)
+2
+1
+0
-1
-2
Smallest (-0x3FE or -1022)
Special (Zero, Denormals)
21
Denormal (Subnormal) Values
Denormal values are smaller than the
minimal normalized value – they allow us to
have values which are close to zero.
Normalized: 1.fraction x 2
Denormal: 0.fraction x 2
(Biased-0x3FF)
-0x3FE
All denormal values have the same true
exponent (-0x3FE or -1022)
22
11
The smallest normalized value:
0000000000010000....0000
Value: 1.0 x 2
-1022
The largest denormal value:
0000000000001111....1111
Value: 0.1111….1111 x 2
-1022
23
The second largest denormal value:
0000000000001111....1110
Value: 0.1111….1110 x 2
-1022
The smallest denormal value:
0000000000000000....0001
Value: 0.0000….0001 x 2
-1022
24
12
Single Precision
Format:
sign of fraction
1 bit
biased exponent 8 bits
fraction
23 bits
Interpretation:
+/- 1.fraction x 2(biased exponent – 0x7F)
25
Single Precision
Interpretation:
+/- 1.fraction x 2(biased exponent – 0x7F)
sign: 0 for positive, 1 for negative
significand: 1.fraction
true exponent: biased exponent – 0x7F
26
13
Range of SP exponents:
1111 1111
1111 1110
1000
1000
0111
0111
0111
0001
0000
1111
1110
1101
0000 0001
0000 0000
Special (Infinity, NaNs)
Largest (+0x7F or +127)
...
+2
+1
+0
-1
-2
...
Smallest (-0x7E or -126)
Special (Zero, Denormals)
27
Decoding and Encoding SP Values
The decoding process is the same as for DP
values, except fields are 1, 8 and 23 bits
The encoding process: sizes of fields
Note: biased exponent field of 8 bits
determines biasing factor:
01111111
=
0x7F
28
14
Internal rep: 40B80000 base 16
0100 0000 1011 1000 0000 0000 0000 0000
sign bit: 0 (positive)
biased exponent: 1000 0001 (0x81)
true exponent: 0x81 – 0x7F = 2
significand: 1.01110000000000000000000
value: +1.0111 x 22 base 2
= +101.11 base 2
= +5.75 base 10
29
Internal rep: BFC00000 base 16
1011 1111 1100 0000 0000 0000 0000 0000
sign bit: 1 (negative)
biased exponent: 0111 1111 (0x7F)
true exponent: 0x7F – 0x7F = 0
significand: 1.10000000000000000000000
value: -1.1 x 20 base 2
= -1.1 base 2
= -1.5 base 10
30
15
Value: +10.75 base 10
= +1010.11 base 2
= +1.01011 x 23 base 2
significand: 1.010110000000....00000000
biased exponent: 0x3 + 0x7F = 0x82
= 10000010
sign bit: 0 (positive)
0100 0001 0010 1100 0000 0000 0000 0000
Internal rep: 412C0000 base 16
31
Value: -99.5 base 10
= -1100011.1 base 2
= -1.1000111 x 26 base 2
significand: 1.10001110000000000000000
biased exponent: 0x6 + 0x7F = 0x85
= 10000101
sign bit: 1 (negative)
1100 0010 1100 0111 0000 0000 0000 0000
Internal rep: C2C70000 base 16
32
16
6-bit Precision
Assume only 6 bits available (artificial).
Format:
sign of fraction
biased exponent
fraction
1 bit
2 bits
3 bits
Interpretation:
+/- 1.fraction x 2(biased exponent – 0x1)
33
Range of exponents:
11
10
01
00
Special (Infinity, NaNs)
Largest (+0x1 or +1)
Smallest (+0x0 or +0)
Special (Zero, Denormals)
Set of positive and negative values
symmetric (just like SP and DP formats)
Clearly artificial, but able to enumerate all
possibilities
34
17
Infinity and NaNs (8 bit patterns)
encoding
-------011111
011110
011101
011100
011011
011010
011001
011000
binary value
-------------------
decimal value
------------NaN
NaN
NaN
NaN
NaN
NaN
NaN
Infinity
35
True Exponent of 1 (8 bit patterns)
encoding
-------010111
010110
010101
010100
010011
010010
010001
010000
binary value
------------------1.111e+1 (11.11)
1.110e+1 (11.10)
1.101e+1 (11.01)
1.100e+1 (11.00)
1.011e+1 (10.11)
1.010e+1 (10.10)
1.001e+1 (10.01)
1.000e+1 (10.00)
decimal value
------------3.75
3.50
3.25
3.00
2.75
2.50
2.25
2.00
36
18
True Exponent of 0 (8 bit patterns)
encoding
-------001111
001110
001101
001100
001011
001010
001001
001000
binary value
------------------1.111e+0 (1.111)
1.110e+0 (1.110)
1.101e+0 (1.101)
1.100e+0 (1.100)
1.011e+0 (1.011)
1.010e+0 (1.010)
1.001e+0 (1.001)
1.000e+0 (1.000)
decimal value
------------1.875
1.750
1.625
1.500
1.375
1.250
1.125
1.000
37
Zero and Denormals (8 bit patterns)
encoding
-------000111
000110
000101
000100
000011
000010
000001
000000
binary value
------------------0.111e+0 (0.111)
0.110e+0 (0.110)
0.101e+0 (0.101)
0.100e+0 (0.100)
0.011e+0 (0.011)
0.010e+0 (0.010)
0.001e+0 (0.001)
decimal value
------------0.875
0.750
0.625
0.500
0.375
0.250
0.125
Zero
38
19
encoding
-------011111
..
011000
010111
..
010000
001111
..
001000
000111
..
000000
binary value
-------------------
1.111e+1
(11.11)
1.000e+1
1.111e+0
(10.00)
(1.111)
1.000e+0
0.111e+0
(1.000)
(0.111)
decimal value
------------NaN
Infinity
3.75
(0.25)
2.00
1.875
(0.125)
1.000
0.875
(0.125)
Zero
39
Summary: Floating Point Representation
N

N bits ==> 2 values

Sign and magnitude system: half of values
negative
single: 8 bits
double: 11 bits
S Exponent
single: 23 bits
double: 52 bits
Fraction
x  ( 1)S  (1.Fraction)  2(Exponent Bias)
40
20
Summary: Floating Point Representation

Infinity and NaNs: biased exponent field
all 1’s

Zero and Denormals: biased exponent field
all 0’s
41
Summary: Floating Point Representation

Only sums of powers of 2 are representable
5
3
-1
-3
-7
ex: 2 + 2 + 2 + 2 + 2 + 2

-12
Gap between adjacent values increases as
you move away from Zero
42
21