May 4, 2026Part 2

Part 2: Bare Metal Drivers

embeddedstm32driversc++

Writing SPI, I2C, and UART Drivers on Bare Metal STM32

This is part 2 of my series on building a flight controller from scratch. In the last post I explained why I'm doing this. Now let's get into the actual code: writing peripheral drivers from scratch on the STM32F411. No HAL, no libraries.

Starting point

When I say "from scratch," I mean it. The starting point is a blank main.cpp, a startup assembly file, a linker script, and the CMSIS device header that gives you register definitions. That's it. No STM32CubeIDE, no Arduino framework, no vendor libraries.

The CMSIS header is just a giant file full of things like:

#define GPIOA ((GPIO_TypeDef *) GPIOA_BASE)
#define SPI1  ((SPI_TypeDef *)  SPI1_BASE)

These are pointers to memory-mapped peripheral registers. When you write GPIOA->MODER = something, you're literally writing to a physical register on the chip. That's all "bare metal" means: you're talking directly to the hardware.

UART first - because you need debug output

The first driver I wrote was UART. You need some way to see what your code is doing, and blinking an LED in morse code isn't great for debugging.

UART is the simplest peripheral on the STM32. You configure the baud rate, enable TX and RX, set up the GPIO pins as alternate function, and you can send bytes.

The whole driver is about 50 lines. A write() function that waits for the transmit buffer to be empty and then writes a byte to the data register. A print() function that sends a string character by character. A printHex() function for dumping register values.

I later added printInt() and printFloat() because I needed them for sensor data. These are just integer-to-string conversions done manually. You pull digits off with modulo and division, then print them in reverse order. The float version prints the integer part, a dot, then the fractional digits.

Nothing fancy, but having UART working early saved me countless hours. Every single driver I wrote after this started with printing debug values over serial to see if things were working.

SPI - talking to the radio

SPI was next because the NRF24L01 radio uses it. SPI is straightforward: there's a clock line, a data-out line (MOSI), a data-in line (MISO), and a chip select (CS).

The driver configures three GPIO pins as alternate function (AF5 on the STM32F411 for SPI1) and one as a regular output for chip select. Then you set up the SPI peripheral: master mode, clock polarity and phase (the NRF uses mode 0), prescaler for the clock speed, software slave management.

The core of it is one function transfer(). It waits for the transmit buffer to be empty, writes a byte, waits for the receive buffer to have data, and reads the byte back. SPI is full duplex, so every byte you send also receives a byte.

To test it without any external hardware, I put a jumper wire between MOSI (PA7) and MISO (PA6). Send 0xA5, get 0xA5 back. If that works, SPI is working. Simple, but effective.

The loopback test passed on the first try. I felt like a genius. That feeling didn't last long.

I2C - the cursed protocol

I2C is... something. On paper it's simple: two wires (clock and data), multiple devices on a bus, each with an address. In practice, it's the most annoying protocol I've dealt with.

The pins have to be open-drain (not push-pull like SPI). You need pull-up resistors. The bus has a state machine with start conditions, stop conditions, address phases, ACK/NACK bits, and repeated starts. And on the STM32, you have to read certain status registers in the right order to clear flags, or the peripheral just freezes.

The read sequence is particularly fun. To read a register from a device, you actually do two transactions. First you write the register address (start, send device address + write bit, send register address). Then you do a repeated start, send the device address again with the read bit, read the data byte, send NACK (to tell the device you're done), and generate a stop condition.

If you get any of this wrong, the I2C bus hangs and you have to reset the peripheral. Ask me how I know.

The driver ended up at about 100 lines. It works reliably now, but I definitely spent more time debugging I2C than anything else in this project.

The FreeRTOS task system (and the memcpy bug)

All the drivers run inside FreeRTOS tasks. I wrote a template base class called Task<T> that wraps FreeRTOS's xTaskCreate. You inherit from it, implement a run() method, and call start() to launch the task.

There's a subtle problem with this design that took me hours to figure out.

When you call start(), the method allocates memory on the FreeRTOS heap, copies the task object there with memcpy, and creates the task pointing to the copy. This is necessary because the original object is usually on the caller's stack, which gets destroyed.

The problem shows up when one member of a task holds a pointer or reference to another member. For example, my NRF test task has both a Spi object and an Nrf24l01 object. The NRF driver stores a reference to the SPI driver so it can do transfers. After memcpy, the Spi object exists at a new address on the heap, but the NRF's reference still points to the old (now dead) address on the stack.

The symptoms were bizarre. Raw SPI transfers worked perfectly. I could read the NRF's STATUS register and get 0x0E, which is correct. But calling nrf.writeReg(), which does the exact same SPI transfers through the driver's reference, would hang forever.

The fix was changing the reference to a pointer and adding a setSpi() method that gets called at the start of run(), after the copy has happened:

void run() {
    nrf.setSpi(&spi);  // Fix pointer after memcpy
    // ... rest of task
}

This is the kind of bug that makes you question everything. The code looks correct, the logic is correct, but the runtime memory layout is wrong. I only figured it out by adding print statements between every single line of code until I narrowed it down to the function call itself being the problem, not anything inside it.

What I'd do differently

If I started over, I'd add a timeout to every peripheral wait loop. Right now, if a transfer fails, the code hangs forever waiting for a status flag that will never be set. In a flight controller, hanging forever means crashing, so every while (!(SR & flag)) loop should have a timeout and error return.

I'd also think harder about the task object lifetime issue. The memcpy + reference problem is a footgun that will bite anyone who uses this pattern. A better design might be to initialize peripherals inside run() instead of in constructors, or to use a different task creation pattern that doesn't involve copying.

But for now, it works. And "it works" is the most important state your code can be in when you're trying to build a plane.

Next up

In Part 3, I'll talk about getting the NRF24L01 radio module working. This is where things got really messy. Wrong pinouts, clone chips with broken registers, 30cm jumper wires acting as antennas, and the discovery that my "NRF24L01" isn't actually made by Nordic.

Code: GitHub.