ENGG 1015 Tutorial

ENGG 1203 Tutorial



Computer Systems
Supplementary Notes
Learning Objectives


Compute via computer arithmetic, including IEEE
Floating Point
Evaluate the performance of parallel processing via
Amdahl’s law
1
Computer Arithmetic (1)

Convert the following decimal values to
binary:
a) 205

b) 2133
Perform the following operations in the 2’s
complement system. Use eight bits (including
the sign bit) for each number.
a) add +9 to +6 b) add +14 to -17 c) add +19 to -24
2
Computer Arithmetic (2)

Convert the following decimal values to binary:
a) 205
b) 2133
20510 = 1 x 27 + 1 x 26 + 1 x 23
+ 1 x 22 + 1 x 20
= 110011012
213310 = 1 x 211 + 1 x 26
+1 x 24 + 1 x 22 + 1 x 20
= 1000010101012
3
Computer Arithmetic (3)

Perform the following operations in the 2’s
complement system. Use eight bits (including the sign
bit) for each number.
a) add +9 to +6
b) add +14 to -17
c) add +19 to -24
00001001  9
00000110  6
00001111  15
00001110  14
11101111  17
11111101  3
00010011  19
11101000  24
11111011  5
24  00011000 24  11100111(1's complement)  11101000(2's complement)
4
Overflow

Overflow: Add two positive numbers to get a negative
number or two negative numbers to get a positive
number
For 2’s complement,
(+1)+(+6)
= +7  OK
(+1)+(+7)
= -8  Overflow
(-1)+(-8)
= +7  Overflow
(-6)+(+7)
= -1  OK
5
Addition using 2’s Complement (1)

Perform the following computations.



Indicate on your answer if an overflow has
occurred.
01000000 + 01000001
00000111 − 11111001
(64 + 65)
(7 - -7)
6
Addition using 2’s Complement (2)


01000000 (64)
+ 01000001 (65)
---------------10000001 (-127)  Overflow
 00000111 - 11111001
00000111 (7)
= 00000111 + (-11111001) =
00000111+00000111
+ 00000111 (7)
---------------00001110 (14)  No Overflow
7
Limitation of Parallel Processing


Major challenge is: the proportion of program
inherently sequential
Suppose 80X speedup from 100 processors. What
fraction of original program can be sequential?




10%
5%
1%
<1%
8
Speed-up via Parallel Computation

A uniprocessor computer can operate in either sequential mode or
parallel mode. In parallel mode, computations can be performed
nine times faster than in sequential mode. A certain benchmark
program took time to run on this computer. Furthermore, suppose
that 25% of time was spent in parallel mode whereas the remaining
portion was in sequential mode.




(a) What is the effective speedup of the above execution as compared with the
condition when parallel mode is not used at all?
(b) What is the fraction of parallelized code of the benchmark program?
(c) Suppose we double the speed ratio between parallel mode and the sequential
mode by hardware improvements. What is the new effective speedup?
(d) Suppose the speedup you calculated in (c) is to be obtained by software
improvement alone in terms of parallelization fraction, instead of by any hardware
improvement. What is the new parallelization fraction required?
10
11
Speed-up via N Processors
12
13
Speed-up with K Parallelized Tasks

You have to run two applications on a dual-core server, but the
resource requirements are not equal. The first application needs
80% of the resources, and the other only 20% of the resources.




(a) Given that 40% of the first application is parallelizable, how much
speedup would you achieve with that application if run in isolation?
(b) Given that 99% of the second application is parallelizable, how much
speedup would this application observe if run in isolation?
(c) Given that 40% of the first application is parallelizable, how much
overall system speedup would you observe if you parallelized it?
(d) Given that 99% of the second application is parallelizable, how much
overall system speedup would you get?
14


In the answers in Parts (c) and (d) above, we assume that each
application completely “owns” the whole system (i.e., both processor
cores) during the time it is running. That is, there is no overlapping of the
two applications in time-sharing the two processor cores.
However, in reality, this might not be the case—there should be some
overlapping of execution of the two applications in order not to waste the
resources. The following is an alternative way to determine an answer
for Part (c).
15

In the above timing diagram, we assume that in
completely serial mode, application 1 takes 4T units of
time so that application 2 takes T units of time, due to
the 80% and 20% requirements, respectively. Now, in
parallel mode for application 1, the total time required is
3.2T because the speedup is 1.25.
16


Furthermore, the parallel part requires 0.8T units of time
while the serial part requires 2.4T units of time because,
as we determined in Part (a), the parallel time versus
serial time ratio is 1 to 3 (0.2 vs. 0.6).
Most importantly, as can seen from the diagram,
processor core P2 is idle after the first 0.8T units of time
and so it can actually run application 2, as shown. Thus,
the overall time is still 3.2T for running both applications
in the system. The speed up is therefore 5T/3.2T =
1.5625.
17
IEEE Floating Point Example 1
Convert 2.625 to 8-bit floating point format.
• For integral part: 210 = 102.
• For the fractional part:
0.625
×2=
1.25
1
Generate 1 and continue with the rest.
0.25
×2=
0.5
0
Generate 0 and continue.
0.5
×2=
1.0
1
Generate 1 and nothing remains.
• So 0.62510 = 0.1012, and 2.62510 = 10.1012.
• Add an exponent part: 10.1012 = 10.1012 × 20.
• Normalize: 10.1012 × 20 = 1.01012 × 21.
• Mantissa: 0101
• Exponent: 1 + 3 = 4 = 1002.
• Sign bit is 0.
The result is 01000101. Represented as hex, that is 4516.
18
IEEE Floating Point Example 2
Convert decimal 1.7 to 8-bit floating point format.
• For integral part: 110 = 12. For the fractional part:
0.7
×2=
1.4
1
Generate 1 and continue with the rest.
0.4
×2=
0.8
0
Generate 0 and continue.
0.8
×2=
1.6
1
Generate 1 and continue with the rest.
0.6
×2=
1.2
1
Generate 1 and continue with the rest.
0.2
×2=
0.4
0
Generate 0 and continue.
0.4
×2=
0.8
0
Generate 0 and continue.
0.8
×2=
1.6
1
Generate 1 and continue with the rest.
0.6
×2=
1.2
1
Generate 1 and continue with the rest.
19
• Normalized: 1.10112 = 1.10112 × 20.
• The reason why the process seems to continue endlessly is that it does.
The number 7/10, which makes a perfectly reasonable decimal fraction,
is a repeating fraction in binary, just as the faction 1/3 is a repeating
fraction in decimal.
• We cannot represent this exactly as a floating point number. The closest
we can come in four bits is 0.1011. Since we already have a leading 1,
the best eight-bit number we can make is 1.1011.
• Mantissa is 1011; Exponent is 0 + 3 = 3 = 0112;
Sign bit is 0.
• The result is 00111011 = 3B16.
• This is not exact, of course. If you convert it back to decimal, you get
1.6875.
20
IEEE Floating Point Example 3
Convert the 8-bit floating point number 26 (in hex) to decimal.
• Convert: 2616 = 0 010 0110 2
• Exponent: 0102 = 210; 2 − 3 = -1.
• Denormalize: 1.01102 × 2-1 = 0.10110.
• Convert:
Exp.
20
2-1
2-2
2-3
2-4
Place
Values
1
0.5
0.25
0.125
0.0625
Bits
0
1
0
1
1
Value
.
0.5
+
0.125
+
0.0625
=
0.6875
• Sign: Positive
• Result: 26 is 0.6875.
21
IEEE Floating Point Example 4
Convert the 8-bit floating point number E7 (in hex) to decimal.
• Convert: E716 = 1 110 01112.
• Mantissa: 1.0111
• Exponent: 1102 = 610; 6 − 3 = 3.
• De-normalize: 1.01112 × 23 = 1011.1
• Convert:
Exponents
23
22
21
20
2-1
Place
Values
8
4
2
1
0.5
Bits
1
0
1
1
.
1
Value
8
1
+
0.5
+
2
+
=
11.5
• Sign: Negative
• Result: E7 is -11.5
More examples can be found in http://sandbox.mc.edu/~bennet/cs110/flt/
22