RVE - RISC-V Emulator
A deep dive into instruction decoding, CSRs, traps, ELF loading, Sv32 MMU, and Linux boot in a ~1000-line C++ RISC-V 32b emulator

Overview
RVE is a RISC-V emulator written in C++ that boots Linux 6.1.14 on a rv32nommu kernel. This post walks through the full source - from how a 32-bit instruction word is parsed, through CSR shadowing and trap delegation, all the way to the Sv32 MMU two-level page walk and Linux boot setup.
Despite being roughly 1000 lines of C++, the emulator is complete enough to run a real Linux kernel with a BusyBox userspace, handle supervisor/machine-mode privilege transitions, respond to timer and UART interrupts, and optionally render a framebuffer via SDL2.

Try It in Your Browser
The emulator has been compiled to WebAssembly and WebGL and runs entirely in-browser - no install required.
Part 1 - System Architecture
Before diving into any individual subsystem, it helps to understand how the pieces fit together.
Source Layout
The emulator is split across five source files:
| File | Responsibility |
|---|---|
rv32.h / rv32.cpp | CPU state, memory access, CSRs, traps, CLINT, UART |
emu.cpp | Instruction decode and dispatch (insSelect), instruction implementations |
loader.cpp | ELF loading and Linux flat image loading |
main.cpp | Entry point, headless vs. GUI mode |
app.cpp | SDL2 + ImGui GUI shell |
The CPU State
The CPU is modeled as a single RV32 class:
class RV32 {
u32 clock; // cycle counter
u32 xreg[32]; // x0 – x31
u32 pc; // program counter
u8 *mem; // 128 MiB flat RAM
csr_state csr; // 4096 × u32
clint_state clint; // timer
uart_state uart; // serial
bool reservation_en; // LR/SC
};RAM is a single flat uint8_t heap. Address bit 31 distinguishes RAM from MMIO - any address with bit 31 set is RAM, everything below is a peripheral register. The physical offset into the 128 MiB buffer is simply addr & 0x7FFFFFFF.
Reset state: pc = 0x80000000, all xreg = 0, x11 = 0x1020 (DTB pointer for Linux), privilege = PRIV_MACHINE (3).
Physical Memory Map
0x00001020 – 0x00001FFF DTB blob
0x02000000 – 0x0200BFFF CLINT
0x10000000 – 0x10000007 UART 16550
0x80000000 – 0x87FFFFFF 128 MiB RAM
memGetByte first tests addr & 0x80000000. MMIO addresses fall into a switch table; all other accesses resolve directly to mem[addr & 0x7FFFFFFF]. No DMA, no PCI, no GPU - just CLINT and UART. That's all Linux needs for a console boot.
Part 2 - Instruction Encoding
The 6 RISC-V Instruction Formats
All RISC-V instructions are 32 bits wide and fall into one of six encoding formats. A key design principle: rs1 is always at [19:15] and rs2 at [24:20], so the register file can be read before the opcode is fully decoded.
| Format | Key Fields | Used By |
|---|---|---|
| R | funct7 + rs2 + rs1 + fn3 + rd + opcode | add sub mul div and or xor sll sra |
| I | imm[11:0] + rs1 + fn3 + rd + opcode | addi lw jalr ecall csrrw |
| S | imm[11:5] + rs2 + rs1 + fn3 + imm[4:0] + opcode | sw sh sb |
| B | scattered imm bits + rs2 + rs1 + fn3 + opcode | beq bne blt bge bltu bgeu |
| U | imm[31:12] + rd + opcode | lui auipc |
| J | scrambled imm[20:1] + rd + opcode | jal |
The S-type and B-type formats split the immediate across two fields to keep rs1/rs2 at their fixed positions - stores have no destination register, so the bits that would be rd are reused for the low immediate bits.
B-type branches reassemble bits from four separate locations: {imm[12], imm[11], imm[10:5], imm[4:1], 0}. J-type (JAL) is the most scrambled: {imm[20], imm[19:12], imm[11], imm[10:1], 0}. The scrambling is intentional - it maximizes bit overlap with other formats to reduce the critical path in hardware decoders.
Format Parsers - Bit Extraction
All six format parsers run at the top of insSelect() before any dispatch. The compiler eliminates dead computations for formats not used by the matched instruction:
FormatR parse_FormatR(u32 word) {
FormatR ret;
ret.rd = (word >> 7) & 0x1f;
ret.rs1 = (word >> 15) & 0x1f;
ret.rs2 = (word >> 20) & 0x1f;
ret.rs3 = (word >> 27) & 0x1f;
return ret;
}B-type immediates require sign-extension by checking the MSB and OR-ing in the sign mask:
ret.imm =
(word & 0x80000000 ? 0xfffff000 : 0)
| ((word << 4) & 0x00000800) // bit 11
| ((word >> 20) & 0x000007e0) // bits 10:5
| ((word >> 7) & 0x0000001e); // bits 4:1Part 3 - Instruction Decode and Dispatch
insSelect() - Multi-Stage Masked Dispatch
A single opcode byte isn't enough to fully identify a RISC-V instruction. Different instructions live at different opcode/funct3/funct7 combinations, so insSelect() uses seven progressive mask stages:
| Mask | Bits Exposed | Examples |
|---|---|---|
0x0000007f | opcode only | lui jal auipc |
0x0000707f | opcode + funct3 | addi lw beq csrrw |
0xf800707f | AMO ops | amoswap amoadd |
0xfc00707f | shift immediates | slli srli srai |
0xfe00707f | R-type arithmetic | add sub and mul |
0xfe007fff | sfence.vma | - |
0xffffffff | exact match | ecall mret |
Each stage is a switch over the masked instruction word. A match returns immediately; if no case fires, the next mask is applied and the next switch runs. CSR instructions get a pre-read before any dispatch - if the instruction looks like a CSR op ((ins_word & 0x73) == 0x73), the current CSR value is read into ins_FormatCSR.value so instructions receive the old value atomically.
The imp/run Macro DSL
Instruction implementations are defined with imp and registered with run:
// imp: define an instruction handler
#define imp(name, fmt_t, code)
void Emulator::emu_##name(u32 w, ins_ret *ret, fmt_t ins) { code }
// run: case label + dispatch + early return
#define run(name, opcode, insf)
case opcode:
if (debugMode) ins_p(name)
emu_##name(ins_word, &ret, insf);
return ret;Result helpers write into the ins_ret struct:
#define WR_RD(code) { ret->write_reg = ins.rd; ret->write_val = AS_UNSIGNED(code); }
#define WR_PC(code) { ret->pc_val = code; }
#define WR_CSR(code) { ret->csr_write = ins.csr; ret->csr_val = code; }Implementations read like inline specifications:
imp(add, FormatR, {
WR_RD(AS_SIGNED(cpu.xreg[ins.rs1]) + AS_SIGNED(cpu.xreg[ins.rs2]));
})
imp(beq, FormatB, {
if (cpu.xreg[ins.rs1] == cpu.xreg[ins.rs2])
WR_PC(cpu.pc + ins.imm);
})
imp(amoswap_w, FormatR, { // rv32a atomic swap
u32 tmp = cpu.memGetWord(cpu.xreg[ins.rs1]);
cpu.memSetWord(cpu.xreg[ins.rs1], cpu.xreg[ins.rs2]);
WR_RD(tmp)
})AS_SIGNED / AS_UNSIGNED reinterpret bits without conversion via a pointer cast, avoiding undefined behavior.
ins_ret - The Result Bus
Instructions don't modify CPU state directly. They populate an ins_ret struct; emulate() commits all side-effects after insSelect() returns:
typedef struct {
u32 write_reg; // rd index
u32 write_val; // rd value
u32 pc_val; // next PC
u32 csr_write; // CSR addr
u32 csr_val; // CSR value
Trap trap; // exception
} ins_ret;insReturnNoop() zeros the struct and sets pc_val = pc + 4. Handlers only populate fields they affect. write_reg == 0 is silently discarded at commit time - x0's hard-wired zero is enforced without special-casing inside individual implementations.
After commit, handleIrqAndTrap(&ret) checks whether the instruction raised a trap and whether any interrupts are pending and enabled.
Part 4 - Privileged Architecture
Privilege Levels
Three privilege levels are implemented: Machine (3), Supervisor (1), User (0). The emulator boots in Machine mode.
#define PRIV_USER 0
#define PRIV_SUPERVISOR 1
#define PRIV_MACHINE 3
// stored in csr.privilege, changed by traps and xRETmret / sret restore privilege from MSTATUS.MPP / SSTATUS.SPP and re-enable interrupts via MPIE → MIE. Delegation registers MIDELEG / MEDELEG control which privilege level handles each trap - typically, Linux runs with most traps delegated to S-mode so the kernel handles them without bouncing through M-mode firmware.
CSR Architecture
The CSR file is a flat 4096-entry u32 array. The address itself encodes privilege requirements:
- Bits
[9:8]- minimum privilege level required to access - Bits
[11:10] == 0b11- read-only; writes trap withIllegalInstruction
typedef struct {
u32 data[4096]; // all CSRs by addr
u32 privilege; // current mode
} csr_state;
bool hasCsrAccessPrivilege(u32 addr) {
u32 req = (addr >> 8) & 0x3;
return req <= csr.privilege;
}Key CSRs and their roles:
| Address | Name | Purpose |
|---|---|---|
0x300 | MSTATUS | Global IE, MPP/SPP fields |
0x303 | MIDELEG | Delegate interrupts to S-mode |
0x304 | MIE | Machine interrupt enable |
0x305 | MTVEC | Trap handler address |
0x341 | MEPC | Return address after trap |
0x342 | MCAUSE | Trap cause code |
0x344 | MIP | Interrupt pending |
Shadow Registers - SSTATUS ⊂ MSTATUS
SSTATUS, SIE, and SIP have no backing storage. They are computed on read as masked views of their M-mode counterparts:
u32 readCsrRaw(u32 addr) {
switch (addr) {
case CSR_SSTATUS:
// SSTATUS is MSTATUS masked to S-visible bits only
return csr.data[CSR_MSTATUS] & 0x000de162;
case CSR_SIE:
return csr.data[CSR_MIE] & 0x222;
case CSR_SIP:
return csr.data[CSR_MIP] & 0x222;
case CSR_CYCLE: return clock;
case CSR_TIME: return clint.mtime_lo;
default:
return csr.data[addr & 0xffff];
}
}The 0x000de162 mask exposes only the SSTATUS-legal fields of MSTATUS: SD, MXR, SUM, XS, FS, SPP, SPIE, SIE. M-mode fields (MPP, MPIE, MIE) are invisible to S-mode. The 0x222 mask (0b001000100010) exposes only the supervisor-visible interrupt bits within MIE/MIP: SEIP (9), STIP (5), SSIP (1).
Writes to SSTATUS/SIE/SIP merge back into the M-mode registers using the same masks - there is one source of truth.
Part 5 - Interrupts and Traps
Trap Handling
handleIrqAndTrap() runs after every instruction commit. It first checks for a synchronous trap from the instruction just executed, then scans MIP & MIE for pending interrupts. Synchronous traps take priority. IRQ scan order: MEIP → MSIP → MTIP → SEIP → SSIP → STIP - first match wins.
handleTrap() performs the full trap entry sequence:
- Determine target privilege - check
MIDELEG/MEDELEGto see if the trap is delegated to S-mode - Write trap registers -
xEPC = pc,xCAUSE = type,xTVAL = bad address or instruction - Jump to handler - read
TVEC; if vectored mode (TVEC[1:0] != 0), jump tobase + 4 × cause - Update MSTATUS -
MIE → MPIE,MIE = 0, current privilege →MPP, set new privilege
mret reverses step 4: MPIE → MIE, MPP → privilege.
CLINT - Core-Local Interruptor
The CLINT provides a 64-bit memory-mapped timer and a software interrupt register:
0x02000000 MSIP (software interrupt)
0x02004000 MTIMECMP lo
0x02004004 MTIMECMP hi
0x0200BFF8 MTIME lo
0x0200BFFC MTIME hi
Timer interrupt flow: the OS writes MTIMECMP = MTIME + period. The emulator increments MTIME each cycle. When MTIME >= MTIMECMP, it sets MIP.MTIP = 1. On the next handleIrqAndTrap() call with MIE.MTIE set, a MachineTimerInterrupt trap fires. This is how Linux implements its scheduler tick.
UART - Serial Interrupt Path
The UART 16550 is mapped at 0x10000000. Eight registers are packed into two u32 fields accessed via UART_GET1/2 shift macros. The IIR update rule:
RBR != 0 && IER.RXINT→IIR_RD_AVAILABLE (4)THR == 0 && IER.THRE→IIR_THR_EMPTY (2)- Otherwise →
IIR_NO_INTERRUPT (7)
When the UART has a pending interrupt, uart.interrupting = true. emu.cpp reads this flag and sets MIP.SEIP, which causes handleIrqAndTrap() to fire a SupervisorExternalInterrupt - routed through the standard IRQ delegation path.
Part 6 - Loading and Booting Linux
ELF Loading
loadElf() in loader.cpp reads the ELF32 header, iterates section headers, and copies SHT_PROGBITS sections into the emulated RAM buffer:
// Collect loadable sections
for (const auto &sh : sh_tbl) {
if (sh.sh_type == SHT_PROGBITS) {
ElfSection section{
sh.sh_addr & 0x7FFFFFFF, // strip bit 31 for physical offset
sh.sh_offset,
sh.sh_size
};
sections.push_back(section);
}
}
// Copy sections into emulated RAM
for (auto &s : sections) {
s.sData.resize(s.size);
lseek(fd, s.offset, SEEK_SET);
read(fd, s.sData.data(), s.size);
std::copy(s.sData.begin(), s.sData.end(), data + s.addr_real);
}ELF virtual addresses start at 0x80000000. Masking with 0x7FFFFFFF gives the physical offset into the 128 MiB flat buffer.
Linux Image Loading and Boot ABI
The Linux kernel image is not an ELF - it is a flat binary (Linux 6.1.14 rv32nommu + BusyBox initramfs, ~7–8 MiB). loadLinuxImage() reads it directly into data[0]:
// CPU reset state for Linux boot:
pc = 0x80000000; // → mem[0]
xreg[10] = 0x0000; // a0 = hart ID (0)
xreg[11] = 0x1020; // a1 = DTB pointerThe RISC-V Linux boot ABI requires exactly two things: a0 = hart ID and a1 = physical DTB address. The kernel reads the DTB at startup to discover the memory map, configure UART and CLINT drivers, and start the scheduler. No firmware layer (BBL/OpenSBI) is needed - the kernel boots directly in M-mode from instruction one.
Device Tree Blob
The DTB is mapped at mem[0x1020] via a separate cpu.dtb pointer. It describes:
- One hart at 100 MHz
- 128 MiB RAM at
0x80000000 - UART at
0x10000000 - CLINT at
0x02000000
The DTB is the sole channel through which the kernel learns about the machine. No BIOS, no ACPI tables, no firmware calls - just a small binary blob at a known address.
Part 7 - The Sv32 MMU
The Sv32 MMU is the most complex single subsystem in the emulator. It is only activated by writing to SATP (0x180) with mode bit 31 set - for nommu Linux, it is never used.
SATP and Page Table Structure
mmuUpdate() is called on every csrw satp:
void mmuUpdate(u32 satp) {
mmu.mode = (satp >> 31) & 1;
// 0 = MMU_MODE_OFF (bare/physical)
// 1 = MMU_MODE_SV32 (paged Sv32)
mmu.ppn = satp & 0x3fffff;
// root page dir PA = mmu.ppn × 4096
}A 32-bit virtual address decomposes as: VPN[1] [31:22], VPN[0] [21:12], page offset [11:0].
PTE Format
Each page table entry (PTE) is 32 bits:
| Field | Bits | Description |
|---|---|---|
| PPN[1] | [31:20] | Upper physical page number |
| PPN[0] | [19:10] | Lower physical page number |
| D | [7] | Dirty - page has been written |
| A | [6] | Accessed - page has been read or written |
| U | [4] | User page - accessible in U-mode |
| X | [3] | Executable |
| W | [2] | Writable |
| R | [1] | Readable |
| V | [0] | Valid |
A PTE is a leaf if R == 1 || X == 1. A PTE is a pointer (non-leaf) if R == 0 && X == 0. V == 0 or (!R && W) → immediate page fault.
Two-Level Page Walk
mmuTranslate() is called on every fetch, load, and store when mmu.mode == MMU_MODE_SV32:
for (int level = 0; level < 2; level++) {
u32 page_addr;
if (level == 0) {
// L0: root page directory, indexed by VPN[1]
page_addr = mmu.ppn * 4096u
+ ((addr >> 22) & 0x3ff) * 4u;
} else {
// L1: L0 PTE's PPN, indexed by VPN[0]
page_addr = (ppn0 | (ppn1 << 10)) * 4096u
+ ((addr >> 12) & 0x3ff) * 4u;
}
u32 pte = memGetWord(page_addr);
ppn0 = (pte >> 10) & 0x3ff;
ppn1 = (pte >> 20) & 0xfff;
if (!V || (!R && W)) MMU_FAULT; // invalid PTE
if (R || X) break; // leaf found, stop walking
else if (level == 1) MMU_FAULT; // L1 non-leaf = fault
}
// Assemble physical address
u32 pa = addr & 0xfff;
pa |= super ? ((addr>>12)&0x3ff)<<12 : ppn0<<12;
pa |= ppn1 << 22;
return pa;4 MiB superpages are supported: a leaf at level 0 uses VPN[0] as part of the offset. PPN[0] must be 0 for a valid superpage; otherwise a fault fires.
Permission Checks and Special Bits
After the walk, access permissions are gated on privilege level and access type:
MSTATUS.SUMallows S-mode to access U-mode pages (needed for kernelcopy_to_user)MSTATUS.MXRmakes executable pages readable (simplifies kernel text mapping)- Hardware does not auto-set A or D bits - the OS must set them before a page is used, or accesses fault
Page Faults
Three page fault causes propagate through the standard trap path:
| Cause | Code | Trigger |
|---|---|---|
InstructionPageFault | 12 | Fetch translation failed |
LoadPageFault | 13 | Data read translation failed |
StorePageFault | 15 | Data write translation failed |
All three set ret.trap.en and flow through handleIrqAndTrap(). If MEDELEG has the corresponding bit set, the kernel's page fault handler runs in S-mode.
nommu Fast Path
The rv32nommu Linux kernel never executes csrw satp, so mmu.mode stays MMU_MODE_OFF = 0 for the entire run:
// Fast exit for bare mode - always taken by nommu
if (mmu.mode == MMU_MODE_OFF)
return addr;The full Sv32 walk is compiled in but never reached during nommu Linux operation. Zero overhead for the common case.
Part 8 - Floating-Point Extensions (RV32F/D)
Register File and NaN Boxing
The FP register file is 32 × 64-bit entries, shared between F (single) and D (double) extensions:
u64 freg[32]; // reset: canonical qNaN-boxedSingle-precision values are NaN-boxed per the RISC-V spec §11.3 - the upper 32 bits are set to 0xFFFFFFFF on write, and validated on read:
// Write single-precision
cpu.freg[rd] = 0xFFFFFFFF00000000ULL | bits;
// Read single-precision - validate NaN-box
u64 v = cpu.freg[rs];
if ((v >> 32) != 0xFFFFFFFFu)
return canonical_qNaN; // upper half corrupted, return NaNDouble-precision writes use the full 64-bit value with no boxing. FCSR (0x003) holds the 5-bit rounding mode frm and 5-bit exception flags fflags.
All rv32f and rv32d ISA tests pass. A hello_linux binary that calls printf with floats runs correctly - musl libc soft-float helpers like __adddf3 and __floatsidf execute via the native F/D extension instructions.
Part 9 - Linux Framebuffer Demo
The emulator exposes /dev/fb0 to the Linux guest via MMIO. A demo program in hello_linux/framebuff.c queries screen dimensions with ioctl(fd, FBIOGET_VSCREENINFO, &vinfo), renders a pattern into a pixel buffer, then writes it out with a single write(fd, g_buf, w * h * 4).
Ten render patterns are included:
| Pattern | Technique |
|---|---|
| SMPTE colour bars | Broadcast colour reference |
| HSV gradient / colour wheel | hsv2rgb() using fmodf / fabsf |
| Mandelbrot / Julia set | Fixed-point 4.12 arithmetic (int64 multiply) |
| Plasma | Sine lookup table with interference |
| 3-D wireframe cube | cosf / sinf rotation + float perspective divide |
| Sierpinski / rings / Lissajous | Procedural geometry |
The graphics stack runs entirely as normal Linux userspace - no special emulator hooks. The floating-point patterns use native RV32F instructions executed by the emulator's F extension.
Part 10 - Building and Testing
Build Targets
make all # builds rve (g++ -std=c++17 -O2, SDL2 + OpenGL)
make run # GUI mode - SDL2 window + ImGui
make isas # runs all 60 ISA compliance tests
make linuxn # headless Linux boot (-n flag, raw terminal I/O)
make linux # GUI Linux boot (-r flag, UART in ImGui console)ISA Test Suite
The test suite covers 60 bare-metal ELF32 binaries organised by extension:
| Group | Count | Tests |
|---|---|---|
rv32ui-p-* | 43 | RV32I: add/sub/load/store/branch/jump/lui/auipc |
rv32um-p-* | 8 | RV32M: mul/mulh/div/rem |
rv32ua-p-* | 9 | RV32A: amoswap/amoadd/amoand/lrsc |
rv32mi/si-p-* | - | Machine/Supervisor CSR tests |
Each test runs in M-mode with no trap delegation. Pass: the test writes 0x55 to SYSCON at 0x11100000, which sets syscon_cmd = 0x5555 and halts the emulator cleanly. Fail: any illegal instruction trap or loop timeout before the SYSCON write. .dump disassembly files ship alongside each binary for cross-referencing a failing PC.
Summary - Three Key Insights
Format parsers and CSR pre-reads are unconditional. All six format structs are populated before any dispatch, and CSR pre-reads happen if the opcode looks like a CSR instruction. Dead computations are eliminated by the compiler. This keeps the dispatch code simple: flat masked switches with early returns.
Sv32 is ~20 lines. Two memGetWord() calls plus permission gates. nommu Linux bypasses the entire thing in the first branch - zero overhead for the common case.
Linux boot needs exactly two things. A flat kernel image at mem[0] and a1 = DTB pointer. The DTB encodes the full machine description - memory, UART, CLINT. No firmware (OpenSBI/BBL) layer is needed; the kernel runs directly in M-mode from instruction one.