Bare Assembly Bring-Up on the CH32V003
The CH32V003 is strange in a useful way: a RISC-V microcontroller with 16 KiB of flash, 2 KiB of SRAM, GPIO, timers, UART, SPI, I2C, and a price low enough to make experiments feel cheap.
Most projects start from a vendor SDK, an IDE template, or a prepared C runtime. That is practical, but it hides the most interesting part: what happens between reset and the first useful instruction of your program.
Here we do the opposite. No C runtime. No vendor startup file. No library. Just one assembly file, one linker script, the CH32V003 reference manual, and a minimal loop that toggles a GPIO pin.
The goal is not to write a complete firmware. The goal is to understand the boot path well enough that the rest of the firmware no longer feels magical.
Hardware and Tools
You need:
- a CH32V003 board;
- a WCH-Link or compatible probe;
- an LED connected to
PD4, or a board whose user LED is already on that pin; - a bare-metal RISC-V toolchain with
riscv32-unknown-elf-as,riscv32-unknown-elf-ld, andriscv32-unknown-elf-objcopy; - a flashing tool such as
minichlink.
The CH32V003 uses a QingKe RV32E core. With GNU binutils, a sensible assembler line is:
riscv32-unknown-elf-as -g -mabi=ilp32e -march=rv32ec_zicsr \
-o startup.o startup.S
Two options matter:
-march=rv32ec_zicsr: RV32E core, compressed instructions, and CSR access;-mabi=ilp32e: ABI for RV32E, where only integer registersx0throughx15exist.
Reference Manual Sections
For a minimal bring-up, keep these parts of the CH32V003 reference manual open:
- memory map: 16 KiB of Code Flash and 2 KiB of SRAM starting at
0x20000000; - RCC: peripheral clock enables, especially
R32_RCC_APB2PCENRat0x40021018; - PFIC: exception and interrupt vector table, plus the
mtvecCSR; - GPIO:
GPIOx_CFGLR,GPIOx_OUTDR, andGPIOx_BSHR; - FLASH:
R32_FLASH_ACTLRat0x40022000for flash latency when increasing the system clock.
The model is direct: peripherals are controlled through memory-mapped registers. To blink an LED, enable the GPIO port clock, configure the pin as an output, then write to the output register.
Minimal Linker Script
The linker must place the vector table at the start of the image and expose the RAM top for the stack.
A minimal linker script looks like this:
ENTRY(vector_base)
MEMORY
{
flash (rx) : ORIGIN = 0x00000000, LENGTH = 16K
ram (xrw) : ORIGIN = 0x20000000, LENGTH = 2K
}
SECTIONS
{
. = 0x00000000;
.vectors :
{
KEEP(*(.vectors))
} > flash
.text :
{
*(.text*)
*(.rodata*)
} > flash
. = ALIGN(4);
.bss (NOLOAD) :
{
_bss_start = .;
*(.bss*)
*(COMMON)
. = ALIGN(4);
_bss_end = .;
} > ram
_ram_start = ORIGIN(ram);
_ram_end = ORIGIN(ram) + LENGTH(ram);
_stack_top = _ram_end;
}
_stack_top becomes 0x20000800: SRAM starts at 0x20000000 and is 2 KiB
long. The stack grows downward from there.
KEEP(*(.vectors)) forces the vector table to remain in the output image. In a
bare-metal program, the table may not look referenced from normal code, but the
hardware depends on it directly after reset.
Vector Table
The CH32V003 expects 4-byte vector entries: 0x00000000, 0x00000004,
0x00000008, and so on. For the first step, only the reset entry matters: it
must transfer control to _start.
.section .vectors, "ax"
.option norvc
.align 2
.globl vector_base
vector_base:
j _start
.rept 255
j default_handler
.endr
default_handler:
j default_handler
.option rvc
The critical detail is .option norvc. The core supports 16-bit compressed
instructions, but the vector table must remain a table of 4-byte entries. If
j _start were assembled as a compressed instruction, the table layout would be
wrong.
For a first boot, every non-reset entry can point to an infinite loop. Interrupts can wait.
The _start Entry Point
After reset, put the processor into a known state:
- disable interrupts;
- initialize
sp; - clear
.bss; - initialize the GPIO;
- enter the main loop.
.section .text
.align 2
.globl _start
_start:
csrci mstatus, 0x8
la sp, _stack_top
la t0, _bss_start
la t1, _bss_end
bss_clear_loop:
bgeu t0, t1, bss_clear_done
sw zero, 0(t0)
addi t0, t0, 4
j bss_clear_loop
bss_clear_done:
call led_init
j main_loop
Even if the first program does not use global variables, clearing .bss is the
right habit. Once you add a counter, buffer, or global state variable, it starts
from zero instead of whatever happened to be in RAM.
Configure PD4 as an Output
This example uses PD4 as the LED output.
From the reference manual:
R32_RCC_APB2PCENRis at0x40021018;IOPDEN, bit 5, enables the port D clock;R32_GPIOD_CFGLRis at0x40011400;R32_GPIOD_OUTDRis at0x4001140C;- each pin uses 4 bits in
GPIOx_CFGLR.
For PD4, the configuration field starts at bit 4 * 4 = 16. The value
0b0001 configures the pin as a 10 MHz push-pull output.
.equ RCC_APB2PCENR, 0x40021018
.equ RCC_IOPDEN, (1 << 5)
.equ GPIOD_CFGLR, 0x40011400
.equ GPIOD_OUTDR, 0x4001140C
.equ LED_PIN, 4
.globl led_init
led_init:
li t0, RCC_APB2PCENR
lw t1, 0(t0)
li t2, RCC_IOPDEN
or t1, t1, t2
sw t1, 0(t0)
li t0, GPIOD_CFGLR
lw t1, 0(t0)
li t2, ~(0xF << 16)
and t1, t1, t2
li t2, (0x1 << 16)
or t1, t1, t2
sw t1, 0(t0)
ret
The pattern is deliberate: read the register, clear only the field you need, set the new value, then write it back. That avoids clobbering the configuration of other pins on the same port.
Minimal Work Loop
Now that the pin is configured, the smallest visible workload is a blink loop:
main_loop:
call led_toggle
call delay
j main_loop
led_toggle:
li t0, GPIOD_OUTDR
lw t1, 0(t0)
li t2, (1 << LED_PIN)
xor t1, t1, t2
sw t1, 0(t0)
ret
delay:
li t0, 100000
delay_loop:
addi t0, t0, -1
bnez t0, delay_loop
ret
This delay is not a timer. It depends on CPU frequency and instruction
execution. For first bring-up, that is fine: the goal is a visible proof that
reset, vector dispatch, _start, and GPIO writes all work.
Complete Program
Here is a standalone startup.S:
.equ RCC_APB2PCENR, 0x40021018
.equ RCC_IOPDEN, (1 << 5)
.equ GPIOD_CFGLR, 0x40011400
.equ GPIOD_OUTDR, 0x4001140C
.equ LED_PIN, 4
.section .vectors, "ax"
.option norvc
.align 2
.globl vector_base
vector_base:
j _start
.rept 255
j default_handler
.endr
default_handler:
j default_handler
.option rvc
.section .text
.align 2
.globl _start
_start:
csrci mstatus, 0x8
la sp, _stack_top
la t0, _bss_start
la t1, _bss_end
1:
bgeu t0, t1, 2f
sw zero, 0(t0)
addi t0, t0, 4
j 1b
2:
call led_init
main_loop:
call led_toggle
call delay
j main_loop
led_init:
li t0, RCC_APB2PCENR
lw t1, 0(t0)
li t2, RCC_IOPDEN
or t1, t1, t2
sw t1, 0(t0)
li t0, GPIOD_CFGLR
lw t1, 0(t0)
li t2, ~(0xF << 16)
and t1, t1, t2
li t2, (0x1 << 16)
or t1, t1, t2
sw t1, 0(t0)
ret
led_toggle:
li t0, GPIOD_OUTDR
lw t1, 0(t0)
li t2, (1 << LED_PIN)
xor t1, t1, t2
sw t1, 0(t0)
ret
delay:
li t0, 100000
1:
addi t0, t0, -1
bnez t0, 1b
ret
Build it with:
riscv32-unknown-elf-as -g -mabi=ilp32e -march=rv32ec_zicsr \
-o startup.o startup.S
riscv32-unknown-elf-ld -g -T ch32v003-min.ld \
startup.o -o blink.elf
riscv32-unknown-elf-objcopy -O binary blink.elf blink.bin
Flash it:
minichlink -w blink.bin flash -b -r
If the LED blinks, the base path is validated: the image is linked at the right address, the vector table is usable, the stack is initialized, code executes from flash, and the GPIO registers respond.
Next Steps: Clock, Flash, Interrupts
The program above can run from the reset clock. A common next step is switching to a faster system clock, typically through HSE and PLL.
Before increasing the clock, check FLASH_ACTLR.LATENCY. The manual specifies:
00: 0 wait states, recommended up to 24 MHz;01: 1 wait state, recommended up to 48 MHz.
Set flash latency before speeding up the core. Otherwise the CPU may request instructions faster than flash can deliver them.
Interrupts come after that. The mtvec CSR (0x305) contains the vector table
base address. In offset-by-interrupt-number mode, hardware jumps to:
base + cause * 4
That is why the vector table must stay aligned and made of 4-byte entries.
On the CH32V003, the INTSYSCR CSR (0x804) can also enable hardware stacking
and interrupt nesting. That is useful for a more serious firmware, but it is not
required for the first blink.
Bring-Up Checklist
When nothing starts:
- check that
.vectorsis placed at0x00000000; - check that vector entries are 4 bytes wide;
- check that
_stack_topis0x20000800; - keep interrupts disabled until handlers are ready;
- start with one GPIO before UART, SPI, or timers;
- enable the GPIO port clock before touching GPIO registers;
- switch to 48 MHz only after setting
FLASH_ACTLR.LATENCY; - add
mtvec, PFIC, and real interrupts only after the minimal boot path is stable.
This is intentionally small. On a chip with 16 KiB of flash and 2 KiB of SRAM, understanding the first ten instructions is often more valuable than starting from an opaque software stack.