Introduction
This text is a compilation of the notes I've taken while studying computer architecture. I intended to write about some aspects of Tomasulo's algorithm that aren't clear
to other students like me using the RISC-V architecture.
Description of Tomasulo's Algorithm
- Allows out-of-order execution. It avoids RAW hazards waiting for the operands while executing other instructions.
- Avoids WAR and WAW hazards through register renaming.
- Hardware loop unrolling.
- No need for special compilers.
Parts of the Tomasulo's Machine
First, let's start reviewing the parts of a Tomasulo machine. Note that this example uses floating point, but it can be used with other types of data.
- The Instruction Queue, where instructions are issued in order, one each cycle, in green.
- The Reservation Stations where operations are waiting to be executed in blue.
Each RE holds the operation to be performed, the operands, and a busy bit:
- The Floating Point Registers in red, RISC-V has 32, but 5 will suffice for this example.
Each register has a field that indicates which Reservation Station holds the value that should be written to said register.
- The memory Load and Store Buffers, which hold the operations waiting for memory, in yellow.
- The Common Data Bus where FP and memory results are broadcast wherever they're needed, in violet.
Stages of a Tomasulo's Machine
- Issue:
Gets the next instruction from the Instruction Queue.
If there is a Reservation Station empty, the operation is sent there, otherwise stall.
Copies the values of operands to the Reservation Station (register renaming), eliminating WAR/WAW hazzards.
If there are missing operands, wait for them to be broadcast through the Common Data Bus.
- Execute:
The operation in a Reservation Station waits until it has all operands.
When it's ready, the operation is sent to the corresponding functional unit.
- Write Result:.
Write the result of the instruction to the Common Data Bus and then to registers and pending Reservation Stations.
Examples of the Tomasulo's Algorithm
The following example was taken from this video.
Download files tomasuloV2.pptx tomasuloV2.pdf
The following example is based in this pdf.
Download files tomasuloloop.pptx tomasuloloop.pdf
Unclear aspects of the Tomasulo's Algorithm
Exceptions
No instruction is executed until the branch that precedes a instruction has completed. This guarantees that an instruction that causes an exception really would have been executed.
In a bare Tomasulo Machine you will get imprecise exceptions. This happens because instructions are completed out of order.
That means that some instructions following the causing exception may execute before the exception-controlling code executes.
WAW Hazards
How does Tomasulo's Algorithm guarantee that this sequence gets the correct result, when it's obvous the fadd instruction will complete before the fdiv?
1: fdiv.s f0, f4, f5
2: fadd.s f0, f1, f2
Each register has a field that points to which Reservation Station should hold its value.
Since instructions are issued in order, the register F0 will be marked as waiting for the fdiv fist, and then for the fadd, as it should be, so it will hold the correct value in the end.
What if two instructions need to broadcast through the Common Data Bus at the same time?
I haven't found a clear answer to this. My guess is that arbitration mechanisms would have to be implemented.