Week 9 Module 54

Spring 2015
Week 9 Additional
Module
Digital Circuits and
Systems
IEEE 754 Format
Shankar Balachandran*
Associate Professor, CSE Department
Indian Institute of Technology Madras
*Currently a Visiting Professor at IIT Bombay

There is no video corresponding to this file.
2
Floating-Point Number Representation
1
S
p bits
m bits
Exponent (E)
Mantissa (M)

Sign bit: S is the sign of the floating point number
Exponent: p-bit exponent (E) in excess-B code.
Mantissa: m-bit unsigned mantissa (M).
Radix: R is the radix for the representation.

Actual value of the number represented above is:




F = (-1)S × 1.M × RE-B

F = (-1)S × M × RE-B
… (if normalized)
… (if unnormalized)
Arithmetic Circuits
3
IEEE Floating-Point Format

Single Precision Format: (32 bits)
8 bits
1
S
31


Exponent (E)
30
unsigned Mantissa (M)
23 22


E=0
with M = 0 represents ZERO
E = 255 with M = 0 represents ± oo
E = 255 with M ≠ 0 represents NaN
Double Precision Format: (64 bits)

0
F = (-1)S × 1. M × 2E-127
Special reserved values:


23 bits
1
11 bits
S
Exponent (E)
52 bits
unsigned Mantissa (M)
F = (-1)S × 1. M × 2E-1023
Arithmetic Circuits
4
Examples

Convert 4.62×102 to IEEE single precision format:
4.62×102 = 462 = 111001110.0 = 1.110011100 × 28
 Mantissa = 110011100 Exponent = 8+127 = 135 = 10000111
0 1000 0111 1100 1110 0000 0000 0000 000 = 0 87 CE0000

Convert -0.456×2-3 to IEEE single precision format:
-0.456×2-3 = - 0.0111 0100 1011 1100 0110 1010 0111 1101 × 2-3
= -1.1101 0010 1111 0001 1010 101 × 2-5
 Exponent = -5+127 = 122 = 01111010
1 0111 1010 1101 0010 1111 0001 1010 101

Convert 1 81 99999A
=
1 7A D2F1AA
to decimal representation:
Mantissa = -1.1001 1001 1001 1001 1001 1010 Exponent = 8116 - 12710
= -1.1001 1001 1001 1001 1001 1010 × 22
= -0110.0110 0110 0110 0110 0110 1 = -6.4
Arithmetic Circuits
5
Floating-Point Addition

How to add two floating-point numbers?
(Ma,Ea) + (Mb,Eb)
1.
2.
3.
4.
5.
6.
7.
Place number with larger exponent in register 1 and the other in 2.
d = E1 – E2
Right-shift mantissa M2 by d bits (i.e., left-shift radix point by d bits).
MSUM = M1 + M2; ESUM=E1
If (MSUM) ≥ 2, renormalize (post-normalization) by dividing by 2
(shifting right) and incrementing ESUM .
Rounding may be required to store the result in the same number of
bits (precision) as the inputs.
Result = (MSUM, ESUM).
Arithmetic Circuits
6
Example: Addition

Add (11000000, 011) to (10101100, 100) (input numbers are in the
normalized form with excess-4 exponent).

That is, (1.11000000 × 2011-100) + (1.10101100 × 2100-100) = ?

Since (100 > 011), M1 = 1.10101100 E1 = 100 and M2 = 1.11000000 E2 =
011
d = E1 – E2 = 100 – 011 = 001
Right-shift M2 by 001 (1) bits  M2 = 0.11100000 E2 = 100
MSUM = 1.10101100 + 0.11100000 = 10.10001100 and
ESUM= E1 =
100
Post-normalize: MSUM = 1.01000110 and ESUM= 101
Therefore,
(1.11000000 × 2011-100) + (1.10101100 × 2100-100) = (1.01000110 × 2101-100)
Or,
(11000000, 011) + (10101100, 100) = (01000110, 101)






Arithmetic Circuits
7
Floating-Point Multiplication

How to multiply two floating-point numbers?
(Ma, Ea) × (Mb, Eb)
1.
2.
3.
4.
5.
6.
MPROD = M1 × M2
EPROD = E1 + E2- bias
Post-normalize MPROD by shifting by an appropriate amount
and then updating EPROD by the same amount.
Rounding may be required to store the result in the same
number of bits (precision) as the inputs.
If necessary, normalize and update EPROD
Result = (MPROD, EPROD).
Arithmetic Circuits
8
Example: Multiplication

Multiply (10101100, 0101) and (11000000, 0110) (input
numbers are in the normalized form excess-8 exponent).

That is, (1.10101100 × 20101-1000) × (1.11000000 × 20110-1000) = ?

MPROD = 1.10101100 × 1.11000000 = 10.1110110100000000
EPROD= E1 + E2 - 8 = 0011
Post-normalize: MPROD = 1.01110110100000000 and
EPROD= 0100
Rounding : MPROD = 1.01110111
Therefore,



and
(1.11000000 × 20101-1000) × (1.10101100 × 20110-1000) = (1.01110111 × 20100-1000)

Or,
(11000000, 0101) × (10101100, 0110) = (01110111, 0100)
Arithmetic Circuits
9
End of Week 9: Additional Module
Thank You
Multipliers+Others
10