1.2 Round-off Errors and Computer Arithmetic

1.2 Round-off Errors and
Computer Arithmetic
1
IEEE floating point numbers
β€’ Binary number: (… 𝑏3 𝑏2 𝑏1 𝑏0 . π‘βˆ’1 π‘βˆ’2 π‘βˆ’3 … )2
β€’ Binary to decimal:
(… 𝑏3 𝑏2 𝑏1 𝑏0 . π‘βˆ’1 π‘βˆ’2 π‘βˆ’3 … )2 =
(… 𝑏3 23 +𝑏 2 22 + 𝑏1 21 + 𝑏0 20 + π‘βˆ’1 2βˆ’1 + π‘βˆ’2 2βˆ’2 + π‘βˆ’3 2βˆ’3 … )10
β€’ Double precision (long real) format
– Example: β€œdouble” in C
β€’ A 64-bit (binary digit) representation
– 1 sign bit (s), 11 exponent bits – characteristic (c), 52 binary fraction bits –
mantissa (f)
Represented number:
βˆ’1 𝑠 2π‘βˆ’1023 (1 + 𝑓)
1023 is called exponent bias
2
β€’
β€’
β€’
β€’
β€’
0 ≀ π‘β„Žπ‘Žπ‘Ÿπ‘Žπ‘π‘‘π‘’π‘Ÿπ‘–π‘ π‘‘π‘–π‘ 𝑐 ≀ 211 βˆ’ 1 = 2047
Smallest normalized positive number on machine has
𝑠 = 0, 𝑐 = 1, 𝑓 = 0: 2βˆ’1022 βˆ™ (1 + 0) β‰ˆ 0.22251 ×
10βˆ’307
Largest normalized positive number on machine has
𝑠 = 0, 𝑐 = 2046, 𝑓 = 1 βˆ’ 2βˆ’52 : 21023 βˆ™ (1 + 1 βˆ’
2βˆ’52 ) β‰ˆ 0.17977 × 10309
Underflow: π‘›π‘’π‘šπ‘π‘’π‘Ÿπ‘  < 2βˆ’1022 βˆ™ (1 + 0)
Overflow: π‘›π‘’π‘šπ‘π‘’π‘Ÿπ‘  > 21023 βˆ™ (2 βˆ’ 2βˆ’52 )
Machine epsilon πœ–π‘šπ‘Žπ‘β„Ž = 2βˆ’52 : this is the
difference between 1 and the smallest machine
floating point number greater than 1.
3
Decimal machine numbers
β€’ k-digit decimal machine numbers:
±0. 𝑑1 𝑑2 … π‘‘π‘˜ × 10𝑛 ,
1 ≀ 𝑑1 ≀ 9, 0 ≀ 𝑑𝑖 ≀ 9
β€’ Any positive number within the numerical range of
machine can be written:
𝑦 = 0. 𝑑1 𝑑2 … π‘‘π‘˜ π‘‘π‘˜+1 π‘‘π‘˜+2 … × 10𝑛
Chopping and Rounding Arithmetic:
Step 1: represent a positive number 𝑦 by
0. 𝑑1 𝑑2 … π‘‘π‘˜ π‘‘π‘˜+1 π‘‘π‘˜+2 … × 10𝑛
Step 2:
– Chopping: chop off after π‘˜ digits:
𝑓𝑙 𝑦 = 0. 𝑑1 𝑑2 … π‘‘π‘˜ × 10𝑛
4
– Rounding: add (5 × 10βˆ’
chopping
π‘˜+1
) × 10𝑛 to 𝑦, then
a) If π‘‘π‘˜+1 β‰₯ 5, add 1 to π‘‘π‘˜ to get 𝑓𝑙 𝑦
b) If π‘‘π‘˜+1 < 5, simply do chopping
Errors and significant digits
5
Finite-digit arithmetic
β€’ Catastrophic events
a) Subtracting nearly equal numbers – this leads to
fewer significant digits.
b) Dividing by a number with small magnitude (or
multiplying by a number with large magnitude).
6
Avoiding loss of accuracy by
reformulating calculations
Quadratic formula to find roots of π‘Žπ‘₯ 2 + 𝑏π‘₯ + 𝑐 =
0, π‘€β„Žπ‘’π‘Ÿπ‘’ π‘Ž β‰  0.
1. π‘₯1 =
2. π‘₯2 =
βˆ’π‘+ 𝑏2 βˆ’4π‘Žπ‘
2π‘Ž
βˆ’π‘βˆ’ 𝑏2 βˆ’4π‘Žπ‘
2π‘Ž
Key: Magnitudes of 𝑏 π‘Žπ‘›π‘‘ 𝑏 2 βˆ’ 4π‘Žπ‘ decide
whether we need to reformulate the formula.
7