Tomasulo's Algorithm

Introduction

This text is a compilation of the notes I've taken while studying computer architecture. I intended to write about some aspects of Tomasulo's algorithm that aren't clear to other students like me using the RISC-V architecture.

Description of Tomasulo's Algorithm

Parts of the Tomasulo's Machine

First, let's start reviewing the parts of a Tomasulo machine. Note that this example uses floating point, but it can be used with other types of data. Tomasulo's Machine Parts
  1. The Instruction Queue, where instructions are issued in order, one each cycle, in green.
  2. The Reservation Stations where operations are waiting to be executed in blue.
  3. Each RE holds the operation to be performed, the operands, and a busy bit:
  4. The Floating Point Registers in red, RISC-V has 32, but 5 will suffice for this example.
  5. Each register has a field that indicates which Reservation Station holds the value that should be written to said register.
  6. The memory Load and Store Buffers, which hold the operations waiting for memory, in yellow.
  7. The Common Data Bus where FP and memory results are broadcast wherever they're needed, in violet.​

Stages of a Tomasulo's Machine

  1. Issue:
  2. Gets the next instruction from the Instruction Queue. If there is a Reservation Station empty, the operation is sent there, otherwise stall. Copies the values of operands to the Reservation Station (register renaming), eliminating WAR/WAW hazzards. If there are missing operands, wait for them to be broadcast through the Common Data Bus.
  3. Execute:
  4. The operation in a Reservation Station waits until it has all operands. When it's ready, the operation is sent to the corresponding functional unit.
  5. Write Result:.
  6. Write the result of the instruction to the Common Data Bus and then to registers and pending Reservation Stations.

Examples of the Tomasulo's Algorithm

The following example was taken from this video.

Download files tomasuloV2.pptx tomasuloV2.pdf

The following example is based in this pdf.

Download files tomasuloloop.pptx tomasuloloop.pdf

Unclear aspects of the Tomasulo's Algorithm

Exceptions

No instruction is executed until the branch that precedes a instruction has completed. This guarantees that an instruction that causes an exception really would have been executed.

In a bare Tomasulo Machine you will get imprecise exceptions. This happens because instructions are completed out of order. That means that some instructions following the causing exception may execute before the exception-controlling code executes.

WAW Hazards

How does Tomasulo's Algorithm guarantee that this sequence gets the correct result, when it's obvous the fadd instruction will complete before the fdiv?

 1: fdiv.s f0, f4, f5
2: fadd.s f0, f1, f2
 

Each register has a field that points to which Reservation Station should hold its value. Since instructions are issued in order, the register F0 will be marked as waiting for the fdiv fist, and then for the fadd, as it should be, so it will hold the correct value in the end.

What if two instructions need to broadcast through the Common Data Bus at the same time?

I haven't found a clear answer to this. My guess is that arbitration mechanisms would have to be implemented.