OS3: Architecture of a Deterministic Event-Driven Microkernel

The Starting Point

Most embedded projects inherit a mental model from desktop operating systems: threads, a scheduler, background tasks, preemption. These abstractions exist for good reasons in general-purpose computing. In deeply embedded systems, they introduce a specific problem: when something goes wrong, you cannot easily answer the question why did this happen now?

OS3 starts from a different premise. The kernel has one purpose: preserve causal integrity in time. Every behavior must be traceable to an explicit cause. Latency is a signal. Ambiguity in the execution model is a defect, not a feature.


The Constitutional Model

Before any code, OS3 defines a set of structural invariants in a constitution document. These are not guidelines — they are hard constraints enforced by the architecture itself. The key principle: correctness built on a structure that allows ambiguity is incorrect by definition.

The constitution defines a small number of core concepts.

Causal Effectiveness is the primary metric: Determinism minus Degeneracy. Determinism means the same cause always produces the same effect. Degeneracy means multiple distinct causes collapse into indistinguishable outcomes — a single opaque error code for three different failures, a monolithic dispatch blob that hides which command triggered which path. The kernel must maximize causal effectiveness. Distinct causes must remain distinguishable at every level of the system.

Event Supremacy defines the only valid execution model. All control flow originates from events. The canonical pipeline is:

ISR → event_enqueue → event_loop → dispatch → handler / fsm_step

No shortcuts. No direct invocations that bypass the queue. No background work. If an alternate execution path exists, it will be used. Therefore it must not exist.

ISR Minimalism constrains interrupt handlers to three operations: acknowledge the hardware source, capture minimal immutable data, enqueue exactly one event. An ISR may not allocate, block, spin, execute protocol logic, or modify FSM state. The ISR preserves time — it does not interpret it.

Layered Sovereignty defines strict, unidirectional dependencies between layers. Hardware knowledge lives only in drivers. The core kernel contains no MMIO addresses, no IRQ numbers, no clock constants. Domain FSMs never call hardware directly. Nothing calls upward across layers.

Non-Bypassability states that all invariants must be enforced structurally. If correctness relies on developer discipline rather than structure, the architecture is invalid.


Execution Model

The kernel has no scheduler and no threads. There is one execution context: the event loop.

At boot, the system initializes hardware, configures the interrupt controller, and enters event_loop. From that point on, the kernel does nothing autonomously. It waits.

When an interrupt fires, the ISR enqueues an event and returns. The event loop wakes, dequeues the event, and dispatches it to the appropriate handler or FSM. The handler runs to completion and returns to the loop. The loop waits again.

There is no preemption in the main context. Events are processed one at a time, in FIFO order. Causality is preserved structurally — not by convention.

Event Format

Every event is a fixed-size structure of three words:

Word 0: header = (kind << 24) | (id << 16) | flags
Word 1: data0  — primary payload
Word 2: data1  — secondary payload

Three kinds exist:

  • EV_KIND_IRQ — hardware interrupt events
  • EV_KIND_TIMER — software timer expiry
  • EV_KIND_SOFT — application-defined continuations

The kind/id pair indexes into a two-dimensional dispatch table. Each (kind, id) pair maps to exactly one handler. There is no runtime routing ambiguity.


The Fundamental Tradeoff

This execution model has a price. It is not hidden, and it is not optional.

Every event handler must have bounded execution time.

In a threaded system, a slow task delays only itself — the scheduler eventually preempts it and runs something else. In OS3, there is no scheduler. The event loop is a strict FIFO queue processed one entry at a time. If a handler runs for too long, every other event waits. If a handler blocks — on a busy-wait, on a hardware poll, on a loop with a non-deterministic exit condition — the entire system stops.

This is not a pathological case. It is the normal consequence of the design. And it is the right tradeoff for the class of systems OS3 targets. The alternative — hiding latency behind a scheduler — does not eliminate the problem, it only makes it harder to observe.

The practical implications are significant:

  • No polling loops in handlers. If a hardware peripheral is not ready, the handler must return and reschedule via a timer event or a deferred soft event. It cannot spin.
  • No synchronous I2C or SPI transactions of unbounded length in the main event path. Long transfers must be delegated to DMA with a completion event.
  • No unbounded data processing in a single handler. If a large buffer must be processed, the work must be broken into bounded increments, each triggered by a continuation event.
  • No recursive event generation. A handler that enqueues a hundred events before returning has shifted latency rather than eliminated it.

This constraint is what makes the FSM and handler taxonomy meaningful. The distinction between a handler (a bounded reaction to a single event) and a service (infrastructure capability with no autonomous behavior) exists precisely because the execution model requires it. If a module cannot guarantee bounded execution, it is not a valid handler — it needs to be restructured as a state machine that advances incrementally.

The upside of this discipline is also significant: the system is auditable. Because every handler is bounded, you can reason about worst-case latency for any event by inspecting the handler list. There is no hidden preemption, no scheduler-induced jitter, no background task that silently competes for CPU time. If something is slow, it is slow in a traceable, identifiable way.


Layer Structure

OS3 defines four layers with strictly unidirectional dependencies.

core/ owns the event engine, FSM engine, timer service, and dispatch infrastructure. It provides explicit ABIs to the layers above. It contains no hardware addresses and no board-specific assumptions. The event queue, dispatch table, and timer deadlines live here.

drivers/ owns hardware. All MMIO register addresses, IRQ numbers, pin configurations, and peripheral initialization sequences live exclusively in driver files, organized by target architecture. Drivers call core ABIs (event_enqueue, register_isr). They never implement domain logic.

subfsm/ contains domain FSMs — modules with explicit state that evolves over time in response to events. An FSM in this layer owns one domain (radio management, host protocol) and communicates with other FSMs exclusively through the event queue.

handlers/ contains bounded event reactions — modules that respond to a single event, perform fixed work, and return. A handler has no state transitions and no progression semantics. It is a reaction, not a process.

The dependency direction is strictly downward. A handler may call a service. A service may call a driver. A driver may call core ABIs. Nothing calls upward.


The FSM Engine

A module qualifies as an FSM only if it satisfies three conditions: it has explicit state, its transitions are table-driven, and fsm_step is the sole authority for state mutation.

The engine takes three inputs at initialization: a transition table, an action table, and an initial state. At runtime, fsm_step(event_id, data0, data1) performs a single lookup:

(next_state, action_index) = table[current_state][event_id]
action_table[action_index](data0, data1)
current_state = next_state

Actions may enqueue events for other FSMs. They never write state directly. State transitions happen only inside fsm_step, after the action returns.

This separation is not aesthetic. It means every possible state change in the system is visible in a table, auditable without reading any code. It means actions cannot create hidden transitions. It means the FSM is mechanically provable.

Designing for Bounded Execution in FSMs

The table-driven model has a direct relationship with the bounded execution requirement.

Because each state/event pair maps to exactly one action, and actions are written as discrete units of work, the FSM structure naturally discourages unbounded paths. An action that would require a loop with non-deterministic exit is a design signal: the operation should be split across multiple states, with the loop body becoming a handler for a timer or completion event.

This is visible in the display pipeline, for instance. Flushing an SSD1306 display over I2C involves sending a window command followed by a full page payload — potentially hundreds of bytes. Rather than doing this in a single action, the flush is structured as a two-phase FSM: phase 0 sends the command and advances to phase 1 via a completion event; phase 1 sends the payload and advances to the next dirty page. Each phase is bounded. Other events can be processed between phases.

Inter-FSM Communication

FSMs communicate exclusively through the event queue. FSM A emits a notification event. The event loop delivers it to FSM B at the next dispatch cycle. There are no direct calls between FSMs, no shared state, no implicit coupling.

This enforces a clean ownership model: each FSM owns its state entirely. No other module can corrupt it.


The Timer Service

The timer service provides software timers backed by a single hardware comparison register (MTIMECMP). Multiple concurrent timers can be active simultaneously.

Each timer has an absolute deadline in hardware ticks. The service maintains the active deadline set and programs MTIMECMP to the nearest expiry. When the hardware fires, the ISR enqueues a timer event. The event loop delivers it to the registered handler.

Timers are a key mechanism for enforcing the bounded execution requirement. Rather than polling in a loop, a handler that needs to wait for a condition arms a timer and returns. The timer event reactivates the handler after the delay. The system processes other events in between.

Deadline comparison uses signed arithmetic on a circular timeline to handle hardware counter wrap-around correctly. The service is documented as single-hart only; a multi-hart port would require atomic operations on the deadline table.


Services and the Async Pattern

Services provide infrastructure capability. They have no autonomous behavior, no internal scheduling, and no implicit progression. They enable; they do not decide.

The async bus pattern is the canonical example of how the bounded execution requirement shapes service design.

A naive I2C write would issue a start condition, clock out each byte, poll the hardware status register until done, and return. This works. It also holds the event loop hostage for the entire transfer duration — typically several hundred microseconds at standard I2C speeds, which is an eternity in event time.

The async version does none of this. The service accepts a write request and immediately returns. It starts the DMA transfer, arms a timeout timer, and sets a busy flag. When DMA completion arrives as a hardware IRQ, the ISR enqueues a completion event. The event loop delivers it to the service handler. The handler clears the busy flag, cancels the timeout, and enqueues a logical completion event for the original caller.

The entire transfer happens in hardware, off the event loop. The event loop is free to process other events — sensor reads, UART output, radio state transitions — while the bytes are being clocked out.

Race Prevention with Generation Counters

The async pattern introduces a specific race: the timeout handler fires and clears state, a new transaction starts immediately, then the stale DMA completion from the previous transaction arrives. Without protection, the stale event would corrupt the new transaction’s state.

OS3 services use a generation counter (txn_seq) to prevent this. The counter is incremented on each accepted transaction. Both the completion handler and the timeout handler check the current counter against the value captured at request time. If they differ, the event is discarded. The sequence counter is wrap-safe and uses 0 as the idle sentinel.


What the Kernel Is Not

No scheduler. No threads. No dynamic memory allocation. No blocking APIs. No background tasks. No preemptive concurrency beyond the interrupt boundary.

These are not omissions — they are definitions. Each absent feature would introduce a category of ambiguity that the architecture is specifically designed to eliminate.

A scheduler introduces implicit ordering and makes worst-case latency analysis depend on the full set of active tasks. Threads introduce shared state and require synchronization primitives that are themselves sources of non-determinism. Blocking APIs hide causality behind a synchronous facade and make the bounded execution requirement impossible to enforce structurally.

Removing them does not make the problems disappear. It makes them visible. The developer must decompose work that would otherwise block into a sequence of bounded steps connected by events. This is harder to write than a blocking loop. It produces a system whose latency behavior is fully observable and auditable, which is the point.

The kernel is an event instrument. Its correctness is measured by how clearly each behavior traces to its cause — and by how rigorously every module that runs on it respects the only contract that makes the model work: finish what you started, quickly, and return.