Fractions and the Real Numbers

Fractions and the Real Numbers
Decimal Numbers 1
Many interesting quantities are not normally integer-valued:
- the mass of a rocket payload
- the batting average of a baseball player
- the average score on an assignment
Therefore, we have fractions…
… and many fractions can be represented using a positional notation where we allow the
powers of the base to be negative as well as non-negative:
4.237 = 4 × 100 + 2 ×10−1 + 3 ×10−2 + 7 ×10−3
Of course, some fractions cannot be represented this way in a finite manner:
2
= 6 ×10−1 + 6 ×10−2 + 6 ×10−3 + ... ≈ 0.666
3
So, rational numbers are a handy idea, but we'll ignore that for now…
CS@VT
Computer Organization I
©2005-2014 McQuain
Fractions: Fixed-Point
Decimal Numbers 2
How can we represent fractions?
- Use a “binary point” to separate positive from negative powers of two – just like
“decimal point.”
- 2’s comp addition and subtraction still work (if binary points are aligned).
But, we cannot represent extremely large or extremely small values, with a reasonable
fixed number of bits, in fixed-point notation, if we must specify an alignment for the
binary point…
CS@VT
Computer Organization I
©2005-2014 McQuain
Floating-point Notation
Decimal Numbers 3
Some numbers are so large or so small that it's inconvenient to write them in the usual
manner:
462500000000000000000000
0.0000000000000000000248
Therefore, we have scientific or floating-point notation:
462500000000000000000000 = 4.625 ×1023
0.0000000000000000000248 = 2.48 ×10−20
But, of course:
4.625 ×1023 = 4625 ×1020 = 0.4625 ×1024
So, we frequently adopt a normalized representation requiring the decimal point to be in a
specific location, say immediately after the first digit:
4.625 ×1023
CS@VT
Computer Organization I
©2005-2014 McQuain
Anatomy of a Floating-point Value
Decimal Numbers 4
A floating-point value can be viewed as a combination of three distinct components:
sign
the sign of the number
significand
the normalized value (to be shifted by the size of the exponent)
exponent
the amount by which the significand is shifted to obtain the true value
+4.625 ×1023
(The base of representation is implicit.)
CS@VT
Computer Organization I
©2005-2014 McQuain
Converting base-10 to base-2
Decimal Numbers 5
At the hardware level, we'll represent values in base-2:
0.625 = 1× 2−1 + 1× 2−3 → 0.101 in base-2
In general, a base-10 value between 0 and 1 can be converted to base-2 by successively
multiplying by 2 and recording whether there was a carry across the decimal point, stopping
when you obtain zero (or enough bits to satisfy your needs):
fractional
value
carry-over?
.625
.25
1
.5
0
.0
1
CS@VT
0.101
Computer Organization I
©2005-2014 McQuain
Converting base-10 to base-2
Here are a couple more examples:
3.8125
---------fractional
value
.8125
.625
.25
.5
.0
carry-over?
1
1
0
1
So, 3.8125 converts to
11.1101 in base-2.
Decimal Numbers 6
3.1416
---------fractional
value
.1416
.2832
.5664
.1328
.2656
.5312
.0624
.1248
.2496
.4992
.9984
.9968
.9936
carry-over?
0
0
1
0
0
1
0
0
0
0
1
1
So, 3.1416 is about
11.001001000011 in base-2.
CS@VT
Computer Organization I
©2005-2014 McQuain
Taking It to Hardware
Decimal Numbers 7
We have to decide how to handle the three components of the floating-point representation.
A single bit suffices to represent the sign of the number.
We have to allocate bits for the significand and for the exponent.
- more bits for the significand provides more accuracy (significant digits)
- more bits for the exponent provides a larger range of representation
What about normalization?
- except for zero, every number will have a left-most (most significant bit) of 1
- why store it?
CS@VT
Computer Organization I
©2005-2014 McQuain
IEEE 754 Floating Point Standard(s)
Decimal Numbers 8
IEEE 754-1985 floating point standard defines fundamental formats:
- single precision:
8 bit exponent, 23 bit significand, 1 sign bit
- double precision:
11 bit exponent, 52 bit significand, 1 sign bit
For both:
- the exponent is stored as a non-negative integer, with a bias (127, 1023)
- the significand is normalized so that the binary point is to the right of the first nonzero bit (0 being an exception)
- the first bit of the significand is not stored (phantom bit)
- first bit of significand equals 1 unless biased exponent equals 0
- +0 is represented by 32 zeros; also have -0!
There are lots of other details, including denormalized numbers; we will ignore them.
CS@VT
Computer Organization I
©2005-2014 McQuain
General Format
Decimal Numbers 9
The general format is:
fraction
f
e
CS@VT
significand excluding the high-order bit
# of bits in the fraction
# of bits in the biased exponent
Computer Organization I
©2005-2014 McQuain
IEEE 754 Floating Point Examples
Decimal Numbers 10
Consider the base-10 value 3.8125.
We saw earlier that this converts to 11.1101 in base-2.
This normalizes to 1.11101 x 2^1.
Sign bit is 0 since
number is nonnegative
0
CS@VT
Stored exponent
is 1 + 127 = 128
10000000
Normalized
fraction is 11101,
padded with 0s to
23 bits
11101000000000000000000
Computer Organization I
©2005-2014 McQuain
Floating Point Addition
Decimal Numbers 11
Addition:
- shift so that the exponents are equal
- add the mantissas
- normalize the result
32.2510
= 100000.012
1.0000001 · 25
in IEEE single format
0.187510
= 0.00112
1.1 · 2-3
in IEEE single format
1.0000001
· 25
0.000000011 · 25
----------1.000000111 · 25 = 100000.01112
32.437510
The result is already normalized to the IEEE format.
CS@VT
Computer Organization I
©2005-2014 McQuain
Floating Point Multiplication
Decimal Numbers 12
Multiplication:
- multiply the mantissas
- add the exponents
- normalize the result
32.2510
= 100000.012
1.0000001 · 25
in IEEE single format
0.187510
= 0.00112
1.1 · 2-3
in IEEE single format
1.0000001 · 1.1 = 1.10000011
Exponent would be 2, so the product equals:
1.10000011 · 22 = 110.0000112
6.04687510
Again, the result is already normalized to the IEEE format.
CS@VT
Computer Organization I
©2005-2014 McQuain
Floating Point Complexities
Decimal Numbers 13
Operations are somewhat more complicated than integer operations.
In addition to overflow we can have “underflow”,
Representable values are "relatively" equally spaced.
Accuracy can be a big problem
- IEEE 754 keeps two extra bits, guard and round
- four rounding modes
- positive divided by zero yields “infinity”
- zero divide by zero yields “not a number” (NaN)
- other complexities
Not following the standard can be even worse
- see text for description of 80x86 and Pentium bug!
CS@VT
Computer Organization I
©2005-2014 McQuain
Unrepresentable Numbers
Of course we know that many
fractions cannot be represented in a
finite number of digits.
But things may be worse than we
would naturally expect:
Decimal Numbers 14
0.1
---------fractional
value
.1
.2
.4
.8
.6
.2
.4
.8
.6
.2
carry-over?
0
0
0
1
1
0
0
1
1
So, 0.1 in base-2 is:
0.0001100110011
CS@VT
Computer Organization I
©2005-2014 McQuain
Unrepresentable Results
Decimal Numbers 15
Suppose that X = 1.0 and Y = 1.0 x 2-24.
Both are representable as normalized IEEE singles.
But:
X + Y = 1.00000000000000000000000 +
0.000000000000000000000001
= 1.000000000000000000000001
And that value cannot be stored correctly as an IEEE single; in fact, the result will either
be truncated to 1.0 or rounded up when it is stored.
Either way, an incorrect value will have been stored.
Using IEEE doubles merely changes the scale of the problem…
CS@VT
Computer Organization I
©2005-2014 McQuain
Some Floating Point Issues
Decimal Numbers 16
Machine epsilon is defined to be ε = 2-t, where t is the number of bits in the mantissa.
This is the smallest distinguishable relative difference between two numbers that have
different floating-point representations.
Storage errors are inevitable since finitely-many bits are used in the representation. The
most we can expect is that:
fl(x) = x(1 + δ), where |δ| <= ε
Round-off errors are also inevitable, and may be magnified in interesting ways.
Take Numerical Analysis or Numerical Methods to learn more…
CS@VT
Computer Organization I
©2005-2014 McQuain