Solution for CS 2506 Spring 2020 MIPS03 ------------------------------------------------------------------------------- For questions 1 through 4, refer to the incomplete preliminary pipeline design, shown below, which includes the interstage buffers needed to synchronize signals and data with the instructions, and support for forwarding operands. However, this design has no support for load-use hazard detection/stall, or for properly handling beq instructions if the branch is taken. This datapath supports correct execution of any sequence of the following MIPS instructions: add, sub, and, or, slt, lw, and sw. (See the assignment statement for the diagram.) 1. Consider the following sequence of instructions, which involves a number of data hazards that will be handled by the existing forwarding logic: add $t0, $t1, $t2 # 1.1 add $t1, $t0, $t2 # 1.2 reads $t0 after #1.1 writes $t0 sub $t2, $t0, $t1 # 1.3 reads $t0 after #1.1 writes $t0 # reads $t1 after #1.2 writes $t1 lw $t1, 0($t2) # 1.4 reads $t2 after #1.3 writes $t2 sw $t2, 4($t0) # 1.5 reads $t2 after #1.3 writes $t2 beq $t0, $t2, exit # 1.6 Note: there are no load-use hazards here that are not resolved by the existing forwarding logic. a) [15 points] Identify all such hazards, and complete a table like the following one. The first line in the table below IS a correct response, and shows how a row in the table should be filled in. Writer Reader Register Source of forwarded value ------------------------------------------------------ #1.1 #1.2 $t0 EX/MEM buffer Answer: Writer Reader Register Source of forwarded value ------------------------------------------------------ #1.1 #1.2 $t0 EX/MEM buffer #1.1 #1.3 $t0 MEM/WB buffer #1.2 #1.3 $t1 EX/MEM buffer #1.3 #1.4 $t2 EX/MEM buffer #1.3 #1.5 $t2 MEM/WB buffer Aside: there are several other dependencies that are not hazards because the reader is 3 or more cycles behind the writer b) [5 points] How many clock cycles would be required to execute the given sequence of instructions on the pipeline design shown in Figure 1? Answer: Once four cycles have passed, each stage in the pipeline will contain an instruction; after that, an instruction will be completed on each clock cycle. There are six instructions, so it will take 10 cycles to complete all. 2. [10 points] Consider the following sequence of instructions, which involves one or more load-use hazards: lw $t1, 8($t0) # 2.1 sw $t1, 4($t0) # 2.2 load-use hazard wrt $t1 add $s0, $t1, $t2 # 2.3 lw $s2, 0($s0) # 2.4 add $s3, $s0, $s2 # 2.5 load-use hazard wrt $s2 add $s3, $s2, $s3 # 2.6 The sequence also may include data hazards that will be handled correctly by the forwarding logic the pipeline includes; you should ignore those data hazards in your answer. In order for the sequence of instructions to be executed correctly on this pipeline design, one or more nop instructions must be inserted (so that the existing forwarding logic can do the rest). Rewrite the sequence of instructions to show how the sequence of instructions could be "fixed" by inserting the smallest possible number of nop instructions. Answer: A load-use hazard occurs when a lw instruction writes to a register that is read by the instruction that is one cycle behind the lw in the pipeline. A load-use hazard can be corrected by delaying the reading instruction by a single cycle. lw $t1, 8($t0) # 2.1 nop sw $t1, 4($t0) # 2.2 load-use hazard wrt $t1 add $s0, $t1, $t2 # 2.3 lw $s2, 0($s0) # 2.4 nop add $s3, $s0, $s2 # 2.5 load-use hazard wrt $s2 add $s3, $s2, $s3 # 2.6 Note that, if the reading instruction is two or more cycles behind the lw, the forwarding logic that's included in this pipeline design handles the issue, and no stall (nop instructions) are required. 3. [10 points] Consider the following sequence of instructions, which involves both lw and beq instructions: lw $t0, 8($t4) # 3.1 beq $t0, $t1, L1 # 3.3 load-use hazard wrt $t0 lw $t2, 4($t1) # 3.2 branch decision control hazard L1: sub $t4, $t2, $t0 # 3.6 load-use hazard wrt $t2 if branch not taken add $t2, $t0, $t2 # 3.4 The sequence also may include data hazards that will be handled correctly by the forwarding logic the pipeline includes; you should ignore those data hazards in your answer. In order for the sequence of instructions to be executed correctly on this pipeline design, one or more nop instructions must be inserted (so that the existing forwarding logic can do the rest). Rewrite the sequence of instructions to show how the sequence of instructions could be "fixed" by inserting the smallest possible number of nop instructions. Answer: lw $t0, 8($t4) # 3.1 nop beq $t0, $t1, L1 # 3.3 now, forwarding handles dependency wrt $t0 nop nop lw $t2, 4($t1) # 3.2 now, branch decision is made before fetch nop L1: sub $t4, $t2, $t0 # 3.6 now, forwarding handles dependency wrt $t2 # if branch is not taken add $t2, $t0, $t2 # 3.4 4. Consider the alternate implementation of the MIPS pipeline sets the Write register # for each instruction in the ID stage, as shown in Figure 2: (See the assignment statement for the diagram.) Suppose the following sequence of instructions is sent into the pipeline shown above, and the registers initially store the values shown in the table below: register initial value $t1 1000 $t2 2000 $t3 3000 $t4 4000 $t5 5000 add $t1, $t2, $t3 # 4.1 add $t5, $t1, $t2 # 4.2 add $t2, $t5, $t1 # 4.3 add $t3, $t1, $t1 # 4.4 add $t1, $t5, $t1 # 4.5 Aside: What's wrong with this design? It sets the Write Register number based on the instruction that's currently in the ID stage of the pipeline, but the write operation is actually going to be performed by the instruction that is currently in the WB stage of the pipeline. So, the effect is that the writing instruction will have its destination register chosen by the instruction that's three cycles behind it, in the ID stage. Note, though, that the Write register number is still sent forward via the interstage buffers, and the Forwarding Unit will still see the intended Write register numbers for the instructions that are in the MEM and WB stages, so the correct forwarding decisions will still be made. And, note that the correct (i.e., intended) data is still delivered to the Write data port on the Register unit, while the writing instruction is in the WB stage. a) [5 points] Which register will instruction #4.1 actually write its result to? Answer: Instruction #4.1 must be in WB in order to write data; at that time, the instruction in the ID stage will be #4.4. So, the destination register for #4.1 will actually be $t3. b) [5 points] What value will instruction #4.4 actually read from $t3? Answer: Instruction #4.4 reads its registers during the second half of its cycle in the ID stage; by then, instruction #4.1 will have written to its destination register, so #4.4 will read the value 5000. c) [5 points] What value will instruction #4.2 actually send to the ALU for its first (left) operand? Answer: Instruction #4.1 will compute the sum 5000 before it exits the EX stage. That value will be correctly forwarded to instruction #4.2, which will then send that value to the ALU when #4.2 is in the EX stage. For questions 5 through 7, refer to the pipeline design with forwarding and (load-use) hazard detection, shown below, which supports execution any sequence of the following MIPS instructions: add, sub, and, or, slt, lw, and sw. (See the assignment statement for the diagram.) 5. [15 points] Suppose that, due to a manufacturing defect, the InhibitFetch control signal from the load-use Hazard Detection unit suffers a stuck-at-0 error. That is, the InhibitFetch control signal is initialized to 0 until the first instruction enters the ID stage, and the Hazard Detection unit then always sets InhibitFetch to 0, no matter what the inputs to the Hazard Detection unit are. Assume that the rest of the hardware operates as designed. Suppose that, initially, the registers used below are set as shown below. Suppose that all the memory words are initialized to be 8000. Consider the execution of the following code in this buggy pipeline. Determine the final values of the $t1, $t3, and $t4 registers after all the instructions leave the pipeline (no other instructions modify any of these registers). register initial value $t1 1000 $t2 2000 $t3 3000 $t4 4000 $t5 5000 add $t1, $t2, $t3 # 5.1 lw $t3, 0($t1) # 5.2 add $t4, $t2, $t3 # 5.3 load-use hazard wrt $t3, but no stall add $t3, $t1, $t2 # 5.4 Aside: If InhibitFetch is stuck-at-0, instruction #5.3 will be fetched when lw is in the ID stage, and enter the ID stage as lw moves into the EX stage. So, #5.3 will read the value 3000 from $t3, and that value will not be replaced via forwarding before #5.3 sends it to the ALU. Answer: Here's an outline of what each instruction will do: add $t1, $t2, $t3 # 5.1 $t1 <-- 2000 + 3000 = 5000 lw $t3, 0($t1) # 5.2 $t3 <-- Mem[5000] = 8000 add $t4, $t2, $t3 # 5.3 $t4 <-- 2000 + 3000 = 5000 add $t3, $t1, $t2 # 5.4 $t3 <-- 5000 + 2000 = 7000 6. Suppose the following instructions are in the pipeline shown in Figure 3, in the stages indicated: lw $t1, 0($t3) # 6.1 is in the EX stage add $t2, $t1, $t5 # 6.2 is in the ID stage The load-use Hazard Detection Unit receives three register numbers as input: (See the assignment statement for the diagram.) a) [5 points] What register number will the Hazard Detection unit receive for input A? Answer: This will be the rs register field from the instruction in the ID stage, so it will be the register number for $t1, which is 9 or 01001. b) [5 points] What register number will the Hazard Detection unit receive for input B? Answer: This will be the rt register field from the instruction in the ID stage, so it will be the register number for $t5, which is 13 or 01101. c) [5 points] What register number will the Hazard Detection unit receive for input C? Answer: This will be the rt register field from the instruction in the EX stage, so it will be the register number for $t1, which is 9 or 01001. 7. [12 points] Review the discussions of the Forwarding Unit and the load-use Hazard Detection Unit. Suppose that the branch target address computation and the register comparison (for beq instructions) are moved into the ID stage, as discussed in class. We have seen that if there is a read-after-write data hazard, where the reading instruction is beq, then it may be necessary to stall the beq instruction, for one or two cycles. For each instruction sequence shown below, state whether beq would need to be stalled 0, 1 or 2 cycles, and state the name(s) of the register(s) involved in the hazard that leads to the need to stall. (We assume that the datapath design has been modified to include forwarding hardware so that operands can be substituted into the equality unit that compares registers in the ID stage.) (See the assignment statement for the diagram.) Aside: Following the discussion in lecture, we have the following facts: - if BEQ has a dependency for an input, the correct value must now be forwarded into the ID stage - if BEQ has a hazard and the writing instruction is R-type, then 1 stall is required iff the writing instruction is one stage ahead of the BEQ, and a single stall is sufficient - if BEQ has a hazard and the writing instruction is LW, then 1 stall is required if the LW is two stages ahead of the BEQ, and 2 stalls are required if the LW is only one stage ahead of the BEQ a) add $t2, $t0, $t1 add $s5, $s3, $s2 beq $t2, $s5, L1 Answer: 1 stall due to $s5 b) add $s5, $s3, $s2 add $t2, $t0, $t1 beq $t2, $s5, L1 Answer: 1 stall due to $t2 c) lw $s5, ($s3) lw $t2, ($t1) beq $t2, $s5, L1 Answer: 2 stalls due to $t2 d) lw $t2, ($t1) add $s5, $s3, $s2 beq $t2, $s5, L1 Answer: 1 stall due to $s5 and/or $t2 (Each requires a single stall.)