CSE HomeWork #6 Solution

$30.00 $24.00

Assume that the following MIPS code is executed on a pipelined processor with a 5-stage pipeline, full forwarding, and a predict-taken branch predictor: lw r2, 0(r1) label1: beq r2, r0, label2 # not taken once, then taken lw r3, 0(r2) beq r3, r0, label1 # taken add r1, r3, r1 label2: sw r1, 0(r2) Draw…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)
  1. Assume that the following MIPS code is executed on a pipelined processor with a 5-stage pipeline, full forwarding, and a predict-taken branch predictor:

lw

r2, 0(r1)

label1:

beq

r2, r0, label2

# not taken once, then taken

lw

r3, 0(r2)

beq

r3, r0, label1

# taken

add

r1, r3, r1

label2:

sw

r1, 0(r2)

    1. Draw the pipeline execution diagram for this code, assuming there are no delay slots and that branches execute in the EX stage.

    1. Repeat Part a, but assume that delay slots are used. In the given code, the instruction that follows the branch is now the delay slot instruction for that branch.

    1. One way to move the branch resolution one stage earlier is to not need an ALU operation in conditional branches. The branch instructions would be “bez rd, label” and “bnez rd, label”, and it would branch if the register has and does not have a zero value, respectively. Change the code above to use these branch instructions instead of beq. You can assume that register R8 is available for you to use as a temporary register, and that an seq (set if equal) R-type instruction can be used

(you don;’t have to use either).

    1. As discussed in lecture, we can reduce the control hazards by adding a dedicated comparator in the ID stage (perform comparisons at ID stage instead of EXE stage). Describe THREE scenarios when new hazards are introduced in this setting.

  1. The importance of having a good branch predictor depends on how often conditional branches are executed. Together with branch predictor accuracy, this will determine how much time is spent stalling due to mispredicted branches. In this exercise, assume that the breakdown of dynamic instructions into various instruction categories is as follows:

R-type

BEQ

Jump

LW

SW

40%

25%

5%

25%

5%

The accuracies of three different branch predictors are as follow:

Always-Taken

Always-Not-Taken

2-bit Predictor

45%

55%

85%

  1. Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used.

  1. Repeat Part a for the “always-not-taken” predictor.

  2. Repeat Part a for the 2-bit predictor.

    1. With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

    1. With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

  1. In this exercise we compare the performance of 1-issue and 2-issue processors, taking into account program transformations that can be made to optimize for 2-issue execution. Problems in this exercise refer to the following loop (written in C):

for(i = 0; i != j; i += 2)

b[i] = a[i] – a[i + 1];

a.

Translate this C code into MIPS instructions. Your translation should be direct,

without rearranging instructions to achieve better performance. When writing

MIPS code, assume that variables are kept in registers as follows, and that all

registers except those indicated as Free are used to keep various variables, so they

cannot be used for anything else.

i

j

a

b

c

Free

R5

R6

R1

R2

R3

R10, R11, R12

b.

If the loop exits after executing only two iterations, draw a pipeline diagram for

your MIPS code from Part a executed on a 2-issue processor. Assume the

processor has perfect branch prediction and can fetch any two instructions (not

just consecutive instructions) in the same cycle.

c.

Rearrange your code from Part a to achieve better performance on a 2-issue

statically scheduled processor.

d.

Repeat Part b, but this time use your MIPS code from Part c.

e.

What is the speedup of going from a 1-issue processor to a 2-issue processor? Use

your code from Part a for both 1-issue and 2-issue, and assume that 1,000,000

iterations of the loop are executed. As in Part b, assume that the processor has perfect branch predictions, and that a 2-issue processor can fetch any two instructions in the same cycle.

CSE HomeWork #6 Solution
$30.00 $24.00