Introduction
This text is a compilation of the notes I've taken while studying the RISC-V Architecture and assembly language. It probably has many errors and it's not meant as a substitution of the
official specs or a
good book on the subject. You can consider it a RISC-V assembly language mini-tutorial or gentle introduction into the RISC-V world.
Architecture
Only the RV32I subset of RISC-V is treated in this text. This is the most basic integer registers and operations. Enough for an introduction. Here there are the 32-bit integer, general purpose, registers,
and their intented function in assembly. Note that the function given is optional, since you can use any register anyway you want. The rightmost side indicates who is responsible of saving that
register in a call to a procedure. Again, this indication is optional and a mere convention.
Register List
Register |
Name |
Description |
Saver |
x0 |
zero |
Always zero |
- |
x1 |
ra |
Return Addres |
Caller |
x2 |
sp |
Stack Pointer |
Callee |
x3 |
gp |
Global Pointer |
- |
x4 |
tp |
Thread Pointer |
- |
x5 |
t0 |
Temporary / Alternate Link Reg |
Caller |
x6-x7 |
t1-t2 |
Temporaries |
Caller |
x8 |
s0 / fp |
Saved Register / Frame Pointer |
Callee |
x9 |
s1 |
Saved Register |
Callee |
x10-x11 |
a0-a1 |
Function Arguments / Return Values |
Caller |
x12-x17 |
a2-a7 |
Function Arguments |
Caller |
x18-x27 |
s2-s11 |
Saved Registers |
Callee |
x28-x31 |
t3-t6 |
Temporaries |
Caller |
Instructions
Take into account that many integer instructions treat numbers as two's complement that have
the advantage of using the same circuitry/operations than normal unsigned integers. Instructions that treat numbers as unsigned integers are usually explicitly marked.
All instructions occupy the same, 32 bits, or 4 bytes long.
Instruction Table
Instruction |
Name |
Format |
Description |
add rd, rs1, rs2 |
ADD |
R |
rd=rs1+rs2 |
sub rd, rs1, rs2 |
SUBSTRACT |
R |
rd=rs1-rs2 |
and rd, rs1, rs2 |
AND |
R |
rd=rs1 AND rs2 |
or rd, rs1, rs2 |
OR |
R |
rd=rs1 OR rs2 |
xor rd, rs1, rs2 |
XOR |
R |
rd=rs1 XOR rs2 |
sll rd, rs1, rs2 |
Shift Left Logical |
R |
rd=rs1 << rs2 |
srl rd, rs1, rs2 |
Shift Right Logical |
R |
rd=rs1 >> rs2 |
sra rd, rs1, rs2 |
Shift Right Arithmetical |
R |
rd=rs1 >> rs2 (signed) |
slt rd, rs1, rs2 |
Set Less Than |
R |
if (rs1<rs2) rd=1 else rd=0; (signed) |
sltu rd, rs1, rs2 |
Set Less Than Unsigned |
R |
if (rs1<rs2) rd=1 else rd=0; (unsigned) |
addi rd, rs1, immediate |
ADD Immediate |
I |
rd=rs1+immediate |
andi rd, rs1, immediate |
AND Immediate |
I |
rd=rs1 AND immediate |
ori rd, rs1, immediate |
OR Immediate |
I |
rd=rs1 OR immediate |
xori rd, rs1, immediate |
XOR Immediate |
I |
rd=rs1 XOR immediate |
slli rd, rs1, immediate |
Shift Left Logical Immediate |
I |
rd=rs1 << immediate |
srl rd, rs1, immediate |
Shift Right Logical Immediate |
I |
rd=rs1 >> immediate |
srai rd, rs1, immediate |
Shift Right Arithmetical Immediate |
I |
rd=rs1 >> immediate (signed) |
slti rd, rs1, immediate |
Set Less Than Immediate |
I |
if (rs1<immediate) rd=1 else rd=0; (signed) |
sltiu rd, rs1, immediate |
Set Less Than Immediate Unsigned |
I |
if (rs1<immediate) rd=1 else rd=0; (unsigned) |
lb rd, offset(rs1) |
Load Byte |
I |
rd=sign_extend(Memory_byte[rs1+offset]) |
lh rd, offset(rs1) |
Load Half |
I |
rd=sign_extend(Memory_halfword[rs1+offset]) |
lw rd, offset(rs1) |
Load Word |
I |
rd=Memory[rs1+offset] |
lbu rd, offset(rs1) |
Load Byte Unsigned |
I |
rd=zero_extend(Memory_byte[rs1+offset]) |
lhu rd, offset(rs1) |
Load Halfword Unsigned |
I |
rd=zero_extend(Memory_halfword[rs1+offset]) |
sb rs2, offset(rs1) |
Store Byte |
S |
Memory[rs1+immediate]=lower_byte(rs2) |
sh rs2, offset(rs1) |
Store Half |
S |
Memory[rs1+immediate]=lower_halfword(rs2) |
sw rs2, offset(rs1) |
Store Word |
S |
Memory[rs1+immediate]=rs2 |
beq rs1, rs2, label |
Branch if Equal |
B |
if (rs1==rs2) PC=label; |
bne rs1, rs2, label |
Branch if Not Equal |
B |
if (rs1!=rs2) PC=label; |
blt rs1, rs2, label |
Branch if Less Than |
B |
if (rs1<rs2) PC=label; |
bge rs1, rs2, label |
Branch if Greater or Equal |
B |
if (rs1>=rs2) PC=label; |
bltu rs1, rs2, label |
Branch if Less Than Unsigned |
B |
if (rs1<rs2) PC=label; (unsigned) |
bgeu rs1, rs2, label |
Branch if Greater or Equal Unsigned |
B |
if (rs1>=rs2) PC=label; (unsigned) |
jal rd, label |
Jump And Link |
J |
rd=PC+4; PC=label |
jalr rd, offset(rs1) |
Jump And Link Register |
I |
rd=PC+4; PC=rs1+offset |
lui rd, immediate |
Load Upper Immediate |
U |
rd=immediate<<12 |
auipc rd, immediate |
Add Upper Immediate to PC |
U |
rd=PC+(immediate<<12) |
ecall |
Environment Call |
I |
Transfer Control to the OS |
ebreak |
Environment break |
I |
Transfer Control to Debugger |
add rd, rs1, rs2
Adds registers rs1 and rs2 and puts the result in rd.
sub rd, rs1, rs2
Substracts rs2 from rs1 and puts the result in rd.
and rd, rs1, rs2
Logically ANDs registers rs1 and rs2 and puts the result in rd.
or rd, rs1, rs2
Logically ORs registers rs1 and rs2 and puts the result in rd.
xor rd, rs1, rs2
Logically XORs registers rs1 and rs2 and puts the result in rd.
sll rd, rs1, rs2
Shifts left rs1 by rs2 number of bits and puts the result in rd.
srl rd, rs1, rs2
Shifts right rs1 by rs2 number of bits and puts the result in rd.
sra rd, rs1, rs2
Shifts right arithmetically rs1 by rs2 number of bits and puts the result in rd.
if you need more info about arithmetic shift try https://en.wikipedia.org/wiki/Arithmetic_shift
slt rd, rs1, rs2
Sets rd to 1 if rs1 is less than rs2, else it sets rd to 0.
sltu rd, rs1, rs2
Sets rd to 1 if rs1 is less than rs2 using unsigned number comparison, else it sets rd to 0.
addi rd, rs1, immediate
Adds register rs1 and a sign-extended 12-bit immediate value and puts the result in rd.
andi rd, rs1, immediate
Logically ANDs register rs1 and a sign-extended 12-bit immediate value and puts the result in rd.
ori rd, rs1, immediate
Logically ORs register rs1 and a sign-extended 12-bit immediate value and puts the result in rd.
xori rd, rs1, immediate
Logically XORs registers r1 and a sign-extended 12-bit immediate value and puts the result in rd.
slli rd, rs1, immediate
Shifts left r1 by a 5-bit immediate and puts the result in rd.
srli rd, rs1, immediate
Shifts right r1 by a 5-bit immediate value and puts the result in rd.
srai rd, rs1, immediate
Shifts right arithmetically r1 by immediate (5-bit) value and puts the result in rd.
if you need more info about arithmetic shift try https://en.wikipedia.org/wiki/Arithmetic_shift
slti rd, rs1, immediate
Sets rd to 1 if rs1 is less than immediate (12-bit value), else it sets rd to 0.
sltiu rd, rs1, immediate
Sets rd to 1 if rs1 is less than immediate (12-bit value) using unsigned number comparison, else it sets rd to 0.
lb rd, offset(rs1)
Loads a byte from the memory position rs1+offset, sign extends it, and writes it to rd. Offset is 12 bits signed.
if you need more info about sign extension try https://en.wikipedia.org/wiki/Sign_extension
lh rd, offset(rs1)
Loads a half word from the memory position rs1+offset, sign extends it, and writes it to rd. Offset is 12 bits signed.
if you need more info about sign extension try https://en.wikipedia.org/wiki/Sign_extension
lw rd, offset(rs1)
Loads a word from the memory position rs1+offset, and writes it to rd. Offset is 12 bits signed.
lbu rd, offset(rs1)
Loads a byte from the memory position rs1+offset, zero extends it, and writes it to rd. Offset is 12 bits signed.
if you need more info about zero extension try https://en.wikipedia.org/wiki/Sign_extension
lhu rd, offset(rs1)
Loads a half word from the memory position rs1+offset, zero extends it, and writes it to rd. Offset is 12 bits signed.
if you need more info about zero extension try https://en.wikipedia.org/wiki/Sign_extension
sb rs2, offset(rs1)
Writes the bits 0 to 7 (byte) of register rs2 to the memory position rs1+offset. Offset is 12 bits signed.
sh rs2, offset(rs1)
Writes the bits 0 to 15 (half word) of register rs2 to the memory position rs1+offset. Offset is 12 bits signed.
sw rs2, offset(rs1)
Writes the content of register rs2 to the memory position rs1+offset. Offset is 12 bits signed.
beq rs1, rs2, label
Jumps to label if rs1 and rs2 are equal. Internally, at instruction level, this is implemented has a signed 12 bit value shifted left once, that is added to the PC, thus giving a 4KB jump range.
bne rs1, rs2, label
Jumps to label if rs1 and rs2 are not equal. Internally, at instruction level, this is implemented has a signed 12 bit value shifted left once, that is added to the PC, thus giving a 4KB jump range.
blt rs1, rs2, label
Jumps to label if rs1 is less than rs2. Internally, at instruction level, this is implemented has a signed 12 bit value shifted left once, that is added to the PC, thus giving a 4KB jump range.
bge rs1, rs2, label
Jumps to label if rs1 is greater than or equal to rs2. Internally, at instruction level, this is implemented has a signed 12 bit value shifted left once, that is added to the PC, thus giving a 4KB jump range.
bltu rs1, rs2, label
Jumps to label if rs1 is less than rs2 using unsigned number comparision. Internally, at instruction level, this is implemented has a signed 12 bit value shifted left once, that is added to the PC, thus giving a 4KB jump range.
bgeu rs1, rs2, label
Jumps to label if rs1 is greater than or equal to rs2 using unsigned number comparision. Internally, at instruction level, this is implemented has a signed 12 bit value shifted left once, that is added to the PC, thus giving a 4KB jump range.
jal rd, label
Stores the address of the next instruction in register rd and jumps to label. Internally, at instruction level, this is implemented has a signed 20 bit value shifted left once, that is added to the PC, thus giving a 1MB jump range.
jalr rd, offset(rs1)
Stores the address of the next instruction in register rd and jumps to rs1+offset. Internally, at instruction level, offset is a signed 12 bit value that is added to the register rs1 and the least significant bit is set to 0.
This instruction is made so that the program can jump to any 32-bit address since any arbitrary value can be loaded into rs1.
lui rd, immediate
Replaces the upper 20 bits of rd with immediate and fills the rest with zeros. This instruction is made to work in pairs with addi to fill the lower 12 bits, effective loading a 32 bits constant. Consider the following example:
lui a0, %hi(PRIMESTRING) # this loads the top 20 bits
addi a0, a0, %lo(PRIMESTRING) # this loads the bottom 12 bits
You might think there is an error here since addi takes a signed extended 12 bit immediate to be added to the upper 20 bits of the address, but there is not. The answer it that the compiler macros %hi and %lo
are made to be used in pairs and already take into account the sign extension of the addi instruction when they return the upper 20 bits value, to make both instructions give the correct address of PRIMESTRING.
The alternative is using the la (Load Address) pseudoinstruction.
auipc rd, immediate
Stores the address of the current instruction in register rd and and adds sets the upper 20 bits with immediate, filling the rest with zeros.
This instruction is made so that addresses relative to PC can be loaded. Consider the following code:
1:
auipc a0, %pcrel_hi(msg)
addi a0, a0, %pcrel_lo(1b)
In this code, %pcrel_hi(msg) returns the upper 20 bits of msg relative to the PC, while %pcrel_lo(1b) returns the lower 12 bits of msg relative to the auipc instruction before. This is so because %pcrel_lo() is designed to be paired with %pcrel_hi(). The compiler does the magic.
Take a look to https://sourceware.org/binutils/docs/as/RISC_002dV_002dModifiers.html or https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md for more information.
The following code makes an infinite loop, since ra is loaded with the PC for the auipc instruction, and the jalr jumps to it.
auipc ra,0x0
jalr ra,0(ra)
ecall
Transfers control to the Operating System. The exact functioning of this instruction depends on the machine/environment running the program.
ebreak
Transfers control to the debugger. The exact functioning of this instruction depends on the machine/environment running the program.
Pseudoinstructions
Pseudoinstruction Table
Pseudoinstruction |
Base Instructions |
Description |
la rd, symbol |
auipc rd, delta[31 : 12] + delta[11] addi rd, rd, delta[11:0] |
Load Absolute Address where delta = (symbol − PC) |
l{b|h|w|d} rd, symbol |
auipc rd, (delta[31 : 12] + delta[11]) l{b|h|w|d} rd, delta[11:0] (rd) |
Load byte/halfword/word/double from any 32bit addr |
s{b|h|w|d} rd, symbol, rt |
auipc rt, (delta[31 : 12] + delta[11]) s{b|h|w|d} rd, delta[11:0] (rt) |
Store byte/halfword/word/double to any 32bit addr |
nop |
addi x0, x0, 0 |
No Operation |
li rd, immediate |
lui rd, (immediate[31 : 12] + immediate[11]) addi rd, immediate[11:0] |
Load 32bit immediate |
mv rd, rs |
addi rd, rs, 0 |
Move Register |
not rd, rs |
xori rd, rs, -1 |
Not Register/One’s complement |
neg rd, rs |
sub rd, x0, rs |
Negate Register/Two’s complement |
seqz rd, rs |
sltiu rd, rs, 1 |
Set if = zero |
snez rd, rs |
sltu rd, x0, rs |
Set if < > zero |
sltz rd, rs |
slt rd, rs, x0 |
Set if < zero |
sgtz rd, rs |
slt rd, x0, rs |
Set if > zero |
beqz rs, label |
beq rs, x0, offset |
Branch if = zero |
bnez rs, label |
bne rs, x0, offset |
Branch if <> zero |
blez rs, label |
bge x0, rs, offset |
Branch if <= zero |
bgez rs, label |
bge rs, x0, offset |
Branch if >= zero |
bltz rs, label |
blt rs, x0, offset |
Branch if < zero |
bgtz rs, label |
blt x0, rs, offset |
Branch if > zero |
bgt rs, rt, offset |
blt rt, rs, offset |
Branch if > |
ble rs, rt, offset |
bge rt, rs, offset |
Branch if <= |
bgtu rs, rt, offset |
bltu rt, rs, offset |
Branch if >= (unsigned comp.) |
bleu rs, rt, offset |
bgeu rt, rs, offset |
Branch if <= (unsigned comp.) |
j label |
jal x0, label |
Jump |
jal label |
jal x1, label |
Jump And Link |
jr rs |
jalr x0, 0(rs) |
Jump Register |
jalr rs |
jalr x1, 0(rs) |
Jump And Link Register |
ret |
jalr x0, 0(x1) |
Return from subroutine |
call label |
auipc x1, delta[31 : 12] + delta[11] jalr x1, delta[11:0](x1) |
Call far-away subroutine where delta = (label − PC) |
tail label |
auipc x6, delta[31 : 12] + delta[11] jalr x1, delta[11:0](x6) |
Tail call far-away subroutine where delta = (label − PC) |
Many of these pseudoinstructions involve using two instructions to load a 32 bit value. If you have problems visualizing it, consider the following example.
You want to load the 32 bit value 0ff0fff0:
1. Divide the value into the upper 20 bits (0ff0f) and the lower 12 bits (ff0).
2. Add 1 to the upper 20 bits, since the lower 12 are negative (the sign bit is 1). You get ff10.
3. Upload that value to the register you want with the lui/auipc instructions. That way you get the value ff10000 in that register.
4. Get the lower 12 bits and sign extend them since they will be added via addi/jalr. You get (fffffff0).
5. Add the sign-extended 12 bits (fffffff0) to the value already in the register (ff10000) and you get the value you wanted (ff0fff0).
Examples
To run these examples I used the BRISC-V simulator. I chose it because you don't need to install anything to
run your RISC-V assembly programs. You simply load and run them in its web. Here you can find a compilation of examples and
all the system calls it supports. But It also has problems. BRISC-V doesn't support all the directives other more complex assemblers have, so you need to be
careful of how your programs are structured.
The first example is here :
#here it goes the kernel code
#it setups the stack pointer
addi zero,zero,0
kernel:
addi sp,zero,1536
call main
addi zero,zero,0
mv s1,a0
addi zero,zero,0
addi zero,zero,0
auipc ra,0x0
jalr ra,0(ra)
addi zero,zero,0
addi zero,zero,0
#here it goes the read only data
.rodata
.HELLO:
# .string "Hello World!\n\0" in reverse split in words
.word 0x6C6C6548
.word 0x6F57206F
.word 0x21646C72
.word 0x0000000D
#here it goes the code
.text
main:
# print the string .HELLO
addi t0, zero, 3 # this is the string printing syscall
lui a0, %hi(.HELLO) # this loads the top 20 bits
# of .HELLO address into a0
addi a0, a0, %lo(.HELLO) # this loads the bottom 12 bits
addi a1, zero, 13 # length of the string
ecall
#ask the user for a number
addi t0, zero, 4
ecall
#now a0 containts the number
#do a countdown
countdown:
# print the number
addi t0, zero, 1
ecall
#iterate until a0 is negative
addi a0, a0, -1
bge a0, zero, countdown
For my first program went easy. I print the string "Hello World", ask for a number, and print a countdown to zero. Note that BRISC-V doesn't support the asciiz directive, so we need to specify
the words that made the string. The rest is pretty strightforward, simply make the calls to the simulator using the ecall instructions. You can ignore the first chunk of code. It's just some
code that goes there to initialize the stack pointer and call the main function.
The second example is a bit more complex:
#here it goes the kernel code
#it setups the stack pointer
addi zero,zero,0
kernel:
addi sp,zero,1536
call main
addi zero,zero,0
mv s1,a0
addi zero,zero,0
addi zero,zero,0
auipc ra,0x0
jalr ra,0(ra)
addi zero,zero,0
addi zero,zero,0
#here it goes the read only data
.rodata
PRIMESTRING:
.word 0x4D495250
.word 0x00000D45
NOTPRIMESTRING:
.word 0x20544F4E
.word 0x4D495250
.word 0x00000D45
#here it goes the code
.text
main:
addi a0, zero, 2
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
addi a0, zero, 5
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
addi a0, zero, 4
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
addi a0, zero, 10
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
addi a0, zero, 11
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
addi a0, zero, 43
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
addi a0, zero, 44
addi t0, zero, 1 #number print service
ecall #call OS, print the number in a0
call isprime #check if it's prime
j programexit
# subroutine divide
# divides a0 (dividend) by a1 (divisor)
# returns a0 (remainder), a1 (quotient)
# uses t registers
divide:
addi t0, zero, 0 #reset temporary quotient
divideloop:
blt a0, a1, divideexit #exit if dividend less than divisor
sub a0, a0, a1
addi t0, t0, 1 #add 1 to quotient
jal zero, divideloop #pseudoinstruction j
divideexit:
addi a1, t0, 0
jalr zero, ra, 0 #pseudoinstruction ret
# subroutine isprime
# prints "prime" or "NOT Prime" according to a number passed in a0
isprime:
addi sp,sp,-16
sw ra,0(sp) #save return address
#
addi t6, zero, 2
addi t5, a0, -1
#
isprimeloop:
blt t5, t6, isprimeprintprime
# save t registers used
sw t5, 4(sp)
sw t6, 8(sp)
sw a0, 12(sp)
#
# a0 contains the initial argument
mv a1, t5
jal ra, divide #pseudoinstruction call, we call subroutine divide
beq a0, zero, isprimeprintnotprime # if remainder is zero, we have a divisor
# restore t registers used
lw t5, 4(sp)
lw t6, 8(sp)
lw a0, 12(sp)
#
addi t5, t5, -1
j isprimeloop
#
isprimeprintprime:
addi t0, zero, 3 # this is the string printing syscall
lui a0, %hi(PRIMESTRING) # this loads the top 20 bits
addi a0, a0, %lo(PRIMESTRING) # this loads the bottom 12 bits
addi a1, zero, 6 # length of the string
ecall
j isprimeexit
isprimeprintnotprime:
addi t0, zero, 3 # this is the string printing syscall
lui a0, %hi(NOTPRIMESTRING) # this loads the top 20 bits
addi a0, a0, %lo(NOTPRIMESTRING) # this loads the bottom 12 bits
addi a1, zero, 10 # length of the string
ecall
j isprimeexit
isprimeexit:
lw ra,0(sp) #restore return address
addi sp,sp,16
ret
#
# program exit point
programexit:
It checks whether a number is prime or not and prints a message accordingly. It has two subroutines, isprime, that checks if the number passed int a0 is prime, that instead, supports in the
subroutine divide to check the divisors or a given number.
Reecently I found a better online RISC-V assembly simulator called VENUS. It's much more complete than other toys on the internet. I recommend it to you.
Here you can find its documentation.
I wrote another example more for VENUS . This time it prints a given number of fibonacci numbers.
VENUS example is here :
.data
carrret:
.asciiz "\n"
seed0:
.word 0
seed1:
.word 1
spacestr:
.asciiz " "
.text
li s0, 5 #number of fibonacci numbers to print
#https://github.com/61c-teach/venus/wiki/Environmental-Calls
lw a0, seed0 #load the first two numbers
lw a1, seed1
fiboloop:
ble s0, zero, programexit
call fibonacci
addi s0, s0, -1
j fiboloop
#prints the next fibonacci number
# a0 and a1 contains the two previous fibonacci numbers
#returns the next two fibonacci numbers in a0 and a1 too
fibonacci:
add t0, a0, a1
add a0, a1, zero
add a1, t0, zero
#
addi sp, sp, -8
# save the numbers in a0, a1
sw a0, 0(sp)
sw a1, 4(sp)
# print the number in a1
addi a0, zero, 1
ecall
# print space
li a0, 4 #print space string
la a1, spacestr #asciiz string addr
ecall
# restorethe numbers in a0, a1
lw a0, 0(sp)
lw a1, 4(sp)
addi sp, sp, 8
ret
programexit:
li a0, 4 #print string carriage return
la a1, carrret #asciiz string addr
ecall