CPE432: Computer Design Fall 2010 Homework 2 Solution Total 26 Points Problem 1: Data dependence (8 points) Here is an unusual loop. First, list the dependences (output, anti and true) and then rewrite the loop so that it is parallel. for (i = 1; i < 100; i = i + 1) { a[i] = b[i] + c[i]; // S1 b[i] = a[i] + d[i]; //S2 a[i + 1] = a[i] + e[i]; //S3 } There are five dependences in the loop.
True dependence from S1 to S2 on a.
Anti dependence from S1 to S2 on b.
Loop-carried true dependence from S3 to S2 on a.
Loop-carried true dependence from S3 to S3 on a.
Loop-carried output dependence from S3 to S1 on a.
To parallelize the loop, we simply “break” all of the loop carried dependences. The loop‐carried output dependence can be removed by the compiler through renaming while the loop‐carried true dependences can be removed if the compiler modifies the code. Here is the parallel version of the loop: for (i = 1; i <= 100; i = i + 1) { a[i] = b[i] + c[i]; // S1 b[i] = a[i] + d[i]; //S2 } Looking at the original code, it should be apparent that S3 doesn’t do any useful work in the loop. This transformed version of the loop is functionally equivalent to the original loop. Problem 2 (6 points) Consider the code fragment used in homework1 and which is shown below: Loop: LW R1, 0(R2) DADDI R1, R1, 1 SW R1, 0(R2) DADDI R2, R2, 4 DADDI R4, R4, ‐4 BNEZ R4, Loop Consider the standard 5 stage pipeline machine (IF ID EX MEM WB). Assume the initial value of R4 is 396 and all memory accesses hit in the cache. a. [3 points] Below is the timing for the instruction sequence for the pipeline with full forwarding and bypassing hardware. Assume that branches are resolved in the MEM stage and are predicted as not taken. How does the branch delay slot technique improve performance? Point out where in the timing for the instruction sequence that it would be beneficial. Instruction
LW R1, 0(R2)
DADDI R1, R1, 1
SW R1, 0(R2)
DADDI R2, R2, 4
DADDI R4, R4, -4
BNEZ R4, Loop
LW R1, 0(R2)
C1
F
C2
D
F
C3
X
D
F
C4
M
S
S
C5
W
X
D
F
C6
C7
C8
C9
C10
M
X
D
F
W
M
X
D
F
W
M
X
D
W
M
X
W
M
C11
C12
C13
C14
C15
W
F
D
X
M
W
Solution: The branch delay slot is a place after the branch instruction for an instruction that will be executed regardless of whether the branch is taken or not. By placing such an instruction after the branch and always executing it, we can do useful work while we are still calculating whether the branch is taken or not and what the target address is. More specifically, in part b, the LW instruction enters the IF stage when the branch is in the WB stage (the LW cannot enter the IF stage until this point because we don’t know the branch target until after the MEM stage). If we had a branch delay slot, we could have fit an extra instruction between the 2 with no penalty. Grading: 2 points for explanation. 1 point for pointing out where in part b it is useful. (d) [3 points] Why does static branch prediction improve performance over no branch prediction? Solution: It allows the processor to load an instruction and put it into the pipeline earlier in the pipeline after a branch. If the branch prediction is correct, then nothing needs to be done and we save a few clock cycles. If it mispredicts, we flush the pipeline and then load the correct instruction and so its not different from not predicting at all in this case. Grading: 3 points for explanation. Problem 3: Dynamic Branch Prediction (12 points) Consider the following MIPS code. The register R0 is always 0. DADDI R1, R0, 2
L1:
DADDI
R12, R0, 4
L2:
DSUBI
BNEZ
DSUBI
BNEZ
R12, R12, 1
R12, L2
R1, R1, 1
R1, L1
-- Branch 1
-- Branch 2
Each table below refers to only one branch. For instance, branch 1 will be executed 8 times. Those 8 times should be recorded in the table for branch 1. Similarly branch 2 is executed only 2 times. Part A [4 points] Assume that 1‐bit branch predictors are used. When the processor starts to execute the above code, both predictors contain value N (Not taken). What is the number of correct predictions? Use the following tables to record the prediction and action of each branch. The first entry is filled in for you. Branch 1: Step Branch 1 Prediction 1 N 2 T 3 T 4 T 5 N 6 T 7 T 8 T Branch 2: Step Branch2 Prediction 1 N 2 T Actual Branch 1 Action T T T N T T T N Actual Branch 2 Action T N Number of correct predictions: 4 (branch 1) and 0 (branch 2) Grading: ¼ point for each table entry. The number of correct predictions follows trivially from the correct table entries.
Part B [4 Points] Now assume that 4-bit saturation counters are used. When the processor starts to execute the
above code, both counters contain value 7. What is the number of correct predictions? Use
the following tables to record the prediction and action of each branch. The first entry is filled
in for you.
Branch 1: Step Counter Value 1 2 3 4 5 6 7 8 0111 1000 1001 1010 1001 1010 1011 1100 Branch 2: Step Counter Value 1 2 0111 1000 Branch 1 Prediction N
T T T T T T T Actual Branch 1 Action T
T
T N T
T
T
N
Branch 2 Prediction N
T Actual Branch 2 Action T
N
Number of correct predictions: 5 (branch 1) and 0 (branch 2) Grading: ¼ point for each row in red. The two rows in black are the same as in part A. Part C: [4 points] Now assume that 2 level correlating predictors of the form (2,1) are used. When the processor starts to execute the above code, the outcome of the previous two branches is not taken (N). Also assume that the initial state of predictors of all branches is not taken (N). What is the number of correct predictions? Use the following table to record your steps. Record the "New State" of predictors in the form W/X/Y/Z where, W ‐ state corresponds to the case where the last branch and the branch before the last are both TAKEN X ‐ state corresponds to the case where the last branch is TAKEN and the branch before the last is NOT TAKEN Y ‐ state corresponds to the case where the last branch is NOT TAKEN and the branch before the last is TAKEN Z ‐ state corresponds to the case where the last branch and the branch before the last are both NOT TAKEN The first entry is filled in for you. Branch 1: Step Branch 1 Actual Branch 1 New State Prediction Action 1 N T
N/ N/ N/ T
2 N T
N/ T/ N/ T
3 N T T/ T/ N/ T 4 T N
N/ T/ N/ T
5 T T
N/ T/ N/ T
6 N T
T/ T/ N/ T
7 T T
T/ T/ N/ T
8 T N
N/ T/ N/ T
Branch 2: Step Branch 2 Actual Branch 2 New State
Prediction Action 1 N T
N/ N/ T/ N
2 T N
N/ N/ N/ N
Number of correct predictions: 2 Grading: ¼ point for each row in red. The two rows in black are the same as in part A.
© Copyright 2026 Paperzz