Solution for CS 2506 Spring 2020 MIPS03
-------------------------------------------------------------------------------

For questions 1 through 4, refer to the incomplete preliminary pipeline design,
shown below, which includes the interstage buffers needed to synchronize signals
and data with the instructions, and support for forwarding operands.  However,
this design has no support for load-use hazard detection/stall, or for properly
handling beq instructions if the branch is taken. This datapath supports correct
execution of any sequence of the following MIPS instructions:

   add, sub, and, or, slt, lw, and sw.  

(See the assignment statement for the diagram.)

1.  Consider the following sequence of instructions, which involves a number of
    data hazards that will be handled by the existing forwarding logic:

      add    $t0, $t1, $t2     # 1.1
      add    $t1, $t0, $t2     # 1.2  reads $t0 after #1.1 writes $t0
      sub    $t2, $t0, $t1     # 1.3  reads $t0 after #1.1 writes $t0
                               #      reads $t1 after #1.2 writes $t1
      lw     $t1, 0($t2)       # 1.4  reads $t2 after #1.3 writes $t2
      sw     $t2, 4($t0)       # 1.5  reads $t2 after #1.3 writes $t2
      beq    $t0, $t2, exit    # 1.6

    Note:  there are no load-use hazards here that are not resolved by the
           existing forwarding logic.

    a) [15 points] Identify all such hazards, and complete a table like the
       following one.  The first line in the table below IS a correct response,
       and shows how a row in the table should be filled in.

         Writer   Reader   Register   Source of forwarded value
         ------------------------------------------------------
           #1.1     #1.2     $t0         EX/MEM buffer

       Answer:
         Writer   Reader   Register   Source of forwarded value
         ------------------------------------------------------
           #1.1     #1.2     $t0         EX/MEM buffer
           #1.1     #1.3     $t0         MEM/WB buffer
           #1.2     #1.3     $t1         EX/MEM buffer
           #1.3     #1.4     $t2         EX/MEM buffer
           #1.3     #1.5     $t2         MEM/WB buffer

         Aside:  there are several other dependencies that are not hazards
                 because the reader is 3 or more cycles behind the writer


    b) [5 points] How many clock cycles would be required to execute the given
       sequence of instructions on the pipeline design shown in Figure 1?

       Answer:
       Once four cycles have passed, each stage in the pipeline will contain
       an instruction; after that, an instruction will be completed on each
       clock cycle.

       There are six instructions, so it will take 10 cycles to complete all.


2.  [10 points] Consider the following sequence of instructions, which involves
    one or more load-use hazards:

         lw    $t1, 8($t0)       # 2.1
         sw    $t1, 4($t0)       # 2.2  load-use hazard wrt $t1
         add   $s0, $t1, $t2     # 2.3
         lw    $s2, 0($s0)       # 2.4
         add   $s3, $s0, $s2     # 2.5  load-use hazard wrt $s2
         add   $s3, $s2, $s3     # 2.6

    The sequence also may include data hazards that will be handled correctly by
    the forwarding logic the pipeline includes; you should ignore those data
    hazards in your answer. 

    In order for the sequence of instructions to be executed correctly on this
    pipeline design, one or more nop instructions must be inserted (so that the
    existing forwarding logic can do the rest).  Rewrite the sequence of
    instructions to show how the sequence of instructions could be "fixed" by
    inserting the smallest possible number of nop instructions.

    Answer:
    A load-use hazard occurs when a lw instruction writes to a register that is
    read by the instruction that is one cycle behind the lw in the pipeline.  A
    load-use hazard can be corrected by delaying the reading instruction by a
    single cycle.

         lw    $t1, 8($t0)       # 2.1
         nop
         sw    $t1, 4($t0)       # 2.2  load-use hazard wrt $t1
         add   $s0, $t1, $t2     # 2.3
         lw    $s2, 0($s0)       # 2.4
         nop
         add   $s3, $s0, $s2     # 2.5  load-use hazard wrt $s2
         add   $s3, $s2, $s3     # 2.6

    Note that, if the reading instruction is two or more cycles behind the lw,
    the forwarding logic that's included in this pipeline design handles the
    issue, and no stall (nop instructions) are required.


3.  [10 points] Consider the following sequence of instructions, which involves
    both lw and beq instructions:

           lw    $t0, 8($t4)       # 3.1
           beq   $t0, $t1, L1      # 3.3  load-use hazard wrt $t0
           lw    $t2, 4($t1)       # 3.2  branch decision control hazard
      L1:  sub   $t4, $t2, $t0     # 3.6  load-use hazard wrt $t2 if branch not taken
           add   $t2, $t0, $t2     # 3.4

    The sequence also may include data hazards that will be handled correctly
    by the forwarding logic the pipeline includes; you should ignore those
    data hazards in your answer. 
 
    In order for the sequence of instructions to be executed correctly on this
    pipeline design, one or more nop instructions must be inserted (so that the
    existing forwarding logic can do the rest).  Rewrite the sequence of
    instructions to show how the sequence of instructions could be "fixed" by
    inserting the smallest possible number of nop instructions.

    Answer:

           lw    $t0, 8($t4)       # 3.1
           nop
           beq   $t0, $t1, L1      # 3.3  now, forwarding handles dependency wrt $t0
           nop
           nop
           lw    $t2, 4($t1)       # 3.2  now, branch decision is made before fetch
           nop
      L1:  sub   $t4, $t2, $t0     # 3.6  now, forwarding handles dependency wrt $t2
                                   #           if branch is not taken
           add   $t2, $t0, $t2     # 3.4


4.  Consider the alternate implementation of the MIPS pipeline sets the Write
    register # for each instruction in the ID stage, as shown in Figure 2:

    (See the assignment statement for the diagram.)

    Suppose the following sequence of instructions is sent into the pipeline
    shown above, and the registers initially store the values shown in the
    table below:

      register initial value
      $t1        1000
      $t2        2000
      $t3        3000
      $t4        4000
      $t5        5000

      add   $t1, $t2, $t3     # 4.1
      add   $t5, $t1, $t2     # 4.2
      add   $t2, $t5, $t1     # 4.3
      add   $t3, $t1, $t1     # 4.4
      add   $t1, $t5, $t1     # 4.5

    Aside:
    What's wrong with this design?  It sets the Write Register number based on
    the instruction that's currently in the ID stage of the pipeline, but the
    write operation is actually going to be performed by the instruction that
    is currently in the WB stage of the pipeline.

    So, the effect is that the writing instruction will have its destination
    register chosen by the instruction that's three cycles behind it, in the 
    ID stage.

    Note, though, that the Write register number is still sent forward via the
    interstage buffers, and the Forwarding Unit will still see the intended
    Write register numbers for the instructions that are in the MEM and WB
    stages, so the correct forwarding decisions will still be made.

    And, note that the correct (i.e., intended) data is still delivered to the
    Write data port on the Register unit, while the writing instruction is in
    the WB stage.

    a) [5 points] Which register will instruction #4.1 actually write its
       result to?

       Answer:
       Instruction #4.1 must be in WB in order to write data; at that time, 
       the instruction in the ID stage will be #4.4.  So, the destination
       register for #4.1 will actually be $t3.

    b) [5 points] What value will instruction #4.4 actually read from $t3?

       Answer:
       Instruction #4.4 reads its registers during the second half of its
       cycle in the ID stage; by then, instruction #4.1 will have written
       to its destination register, so #4.4 will read the value 5000. 

    c) [5 points] What value will instruction #4.2 actually send to the ALU
       for its first (left) operand?

       Answer:
       Instruction #4.1 will compute the sum 5000 before it exits the EX
       stage.  That value will be correctly forwarded to instruction #4.2,
       which will then send that value to the ALU when #4.2 is in the EX
       stage.

For questions 5 through 7, refer to the pipeline design with forwarding and
(load-use) hazard detection, shown below, which supports execution any sequence
of the following MIPS instructions:  add, sub, and, or, slt, lw, and sw.  

    (See the assignment statement for the diagram.)


5.  [15 points] Suppose that, due to a manufacturing defect, the InhibitFetch
    control signal from the load-use Hazard Detection unit suffers a stuck-at-0
    error.  That is, the InhibitFetch control signal is initialized to 0 until
    the first instruction enters the ID stage, and the Hazard Detection unit
    then always sets InhibitFetch to 0, no matter what the inputs to the Hazard
    Detection unit are.  Assume that the rest of the hardware operates as
    designed. 

    Suppose that, initially, the registers used below are set as shown below.
    Suppose that all the memory words are initialized to be 8000. Consider
    the execution of the following code in this buggy pipeline. Determine the
    final values of the $t1, $t3, and $t4 registers after all the instructions
    leave the pipeline (no other instructions modify any of these registers).

      register initial value
      $t1        1000
      $t2        2000
      $t3        3000
      $t4        4000
      $t5        5000

      add  $t1, $t2, $t3   # 5.1
      lw   $t3, 0($t1)     # 5.2
      add  $t4, $t2, $t3   # 5.3  load-use hazard wrt $t3, but no stall
      add  $t3, $t1, $t2   # 5.4

    Aside:
    If InhibitFetch is stuck-at-0, instruction #5.3 will be fetched when lw
    is in the ID stage, and enter the ID stage as lw moves into the EX stage.
    So, #5.3 will read the value 3000 from $t3, and that value will not be
    replaced via forwarding before #5.3 sends it to the ALU.

    Answer:
    Here's an outline of what each instruction will do: 

      add  $t1, $t2, $t3   # 5.1  $t1 <-- 2000 + 3000 =  5000
      lw   $t3, 0($t1)     # 5.2  $t3 <-- Mem[5000]   =  8000
      add  $t4, $t2, $t3   # 5.3  $t4 <-- 2000 + 3000 =  5000
      add  $t3, $t1, $t2   # 5.4  $t3 <-- 5000 + 2000 =  7000 

6.  Suppose the following instructions are in the pipeline shown in Figure 3,
    in the stages indicated:

      lw   $t1, 0($t3)     # 6.1 is in the EX stage
      add  $t2, $t1, $t5   # 6.2 is in the ID stage

    The load-use Hazard Detection Unit receives three register numbers as input:

    (See the assignment statement for the diagram.)


    a) [5 points] What register number will the Hazard Detection unit receive
       for input A?

       Answer:
       This will be the rs register field from the instruction in the ID stage,
       so it will be the register number for $t1, which is 9 or 01001.

    b) [5 points] What register number will the Hazard Detection unit receive
       for input B?

       Answer:
       This will be the rt register field from the instruction in the ID stage,
       so it will be the register number for $t5, which is 13 or 01101.

    c) [5 points] What register number will the Hazard Detection unit receive
       for input C?

       Answer:
       This will be the rt register field from the instruction in the EX stage,
       so it will be the register number for $t1, which is 9 or 01001.

7.  [12 points] Review the discussions of the Forwarding Unit and the load-use
    Hazard Detection Unit.  Suppose that the branch target address computation
    and the register comparison (for beq instructions) are moved into the ID
    stage, as discussed in class.  We have seen that if there is a
    read-after-write data hazard, where the reading instruction is beq, then
    it may be necessary to stall the beq instruction, for one or two cycles.
    For each instruction sequence shown below, state whether beq would need to
    be stalled 0, 1 or 2 cycles, and state the name(s) of the register(s)
    involved in the hazard that leads to the need to stall.  (We assume that
    the datapath design has been modified to include forwarding hardware so
    that operands can be substituted into the equality unit that compares
    registers in the ID stage.)

    (See the assignment statement for the diagram.)

    Aside:
       Following the discussion in lecture, we have the following facts:
         - if BEQ has a dependency for an input, the correct value must now
           be forwarded into the ID stage
         - if BEQ has a hazard and the writing instruction is R-type, then
           1 stall is required iff the writing instruction is one stage ahead
           of the BEQ, and a single stall is sufficient
         - if BEQ has a hazard and the writing instruction is LW, then 1
           stall is required if the LW is two stages ahead of the BEQ, and
           2 stalls are required if the LW is only one stage ahead of the BEQ

    a) add   $t2, $t0, $t1
       add   $s5, $s3, $s2
       beq   $t2, $s5, L1

       Answer:
        1 stall due to $s5  

    b) add   $s5, $s3, $s2
       add   $t2, $t0, $t1
       beq   $t2, $s5, L1

       Answer:
        1 stall due to $t2  

    c) lw    $s5, ($s3) 
       lw    $t2, ($t1) 
       beq   $t2, $s5, L1

       Answer:
        2 stalls due to $t2  

    d) lw    $t2, ($t1)
       add   $s5, $s3, $s2
       beq   $t2, $s5, L1

       Answer:
        1 stall due to $s5  and/or $t2

       (Each requires a single stall.)