STM32 → Jetson Orin — Full Pipeline Architecture

Overview

A sensor aggregation system running at 100Hz (one complete data frame every 10ms):

STM32 (Zephyr RTOS) — real-time sensor hub
Reads IMU over I2C at 100Hz
Receives wheel speed over CAN (event-driven)
Aggregates via ZBus, serializes with nanopb
Sends over SPI as slave
Jetson Orin (Linux + ROS2) — high-level compute node
SPI master — drives 100Hz clock
Deserializes nanopb frames
Publishes to ROS2 topics

System Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                        STM32 (Zephyr RTOS)                          │
│                                                                     │
│  ┌──────────┐  I2C    ┌─────────────┐                              │
│  │ IMU      │────────►│ imu_thread  │──► zbus_chan_pub(imu_chan)    │
│  │ ICM42688 │         │  (100Hz)    │                    │          │
│  └──────────┘         └─────────────┘                    │          │
│                                                           ▼          │
│  ┌──────────┐  CAN    ┌─────────────┐          ┌──────────────────┐│
│  │ Wheel    │────────►│ can_thread  │──────────►│  packer_thread   ││
│  │ encoders │         │  (event)    │           │  (100Hz, 10ms)   ││
│  └──────────┘         └─────────────┘           │  nanopb encode   ││
│                                                  │  buffer swap     ││
│  ┌──────────┐  UART   ┌─────────────┐           └────────┬─────────┘│
│  │ GPS/etc  │────────►│ uart_thread │──────────►         │          │
│  └──────────┘         └─────────────┘                    │          │
│                                                           ▼          │
│                                                  ┌──────────────────┐│
│                                                  │  SPI slave + DMA ││
│                                                  │  double buffer   ││
│                                                  └────────┬─────────┘│
└───────────────────────────────────────────────────────────┼──────────┘
                                                            │ SPI
                                                            │ 10MHz
                                                            │ 10ms frames
┌───────────────────────────────────────────────────────────┼──────────┐
│                      Jetson Orin (Linux + ROS2)            │          │
│                                                            ▼          │
│                                                  ┌──────────────────┐│
│                                                  │  spidev master   ││
│                                                  │  100Hz timer     ││
│                                                  └────────┬─────────┘│
│                                                           │           │
│                                                           ▼           │
│                                                  ┌──────────────────┐│
│                                                  │  nanopb decode   ││
│                                                  └────────┬─────────┘│
│                                                           │           │
│                              ┌────────────────────────────┤           │
│                              ▼                            ▼           │
│                    /imu/raw (100Hz)          /wheel_speed (100Hz)     │
│                    sensor_msgs/Imu           geometry_msgs/Twist      │
└─────────────────────────────────────────────────────────────────────┘

Data Flow Per 10ms Frame

t = 0ms   : k_msleep(10) expires — IMU thread wakes
t = 0.5ms : I2C read completes → zbus_chan_pub(&imu_chan, &msg)
t = 0.5ms : CAN event may arrive any time → zbus_chan_pub(&wheel_chan, &msg)
t = 1.0ms : packer_thread wakes (periodic 10ms)
t = 1.0ms : zbus_chan_read(&imu_chan)  — grab latest IMU
t = 1.0ms : zbus_chan_read(&wheel_chan) — grab latest wheel data
t = 1.2ms : nanopb pb_encode() → ~150 bytes
t = 1.5ms : buffer swap — idle buffer becomes DMA active buffer
            (3.5ms safety margin before Jetson reads)
t = 5.0ms : Jetson 100Hz ROS2 timer fires
t = 5.0ms : Jetson asserts CS low
t = 5.0ms : CS ISR on STM32 — DMA already loaded, starts shifting
t = 5.05ms: SPI transfer complete (~50µs for 150 bytes @ 10MHz)
t = 5.1ms : Jetson nanopb decode — ~0.1ms (Python)
t = 5.2ms : rclpy publisher.publish() — ROS2 topic updated
t = 10ms  : cycle repeats

3.5ms safety margin between STM32 finishing the frame and Jetson reading it.

Thread Architecture on STM32

Thread	Priority	Stack	Wakeup	Job
SPI CS ISR	2 (highest)	ISR stack	CS GPIO falling edge	Reload DMA pointer
CAN RX	3	512B	CAN frame interrupt	Decode → zbus pub
IMU (I2C)	5	1024B	k_msleep(10)	Read sensor → zbus pub
Other sensors	6	1024B	k_msleep(10)	Read → zbus pub
Packer	8	2048B	k_msleep(10)	Read zbus → nanopb encode → buffer swap
Logger	10 (lowest)	1024B	zbus subscriber	Log to flash/UART

Lower priority number = higher priority in Zephyr. Packer is lower priority than sensors so sensors always finish writing before packer reads.

Why These Design Choices?

Why SPI master on Jetson (not STM32)?

Jetson controls the clock → deterministic 100Hz on the ROS2 side
STM32 just needs data ready before CS goes low — easier to guarantee with double buffering
If STM32 were master: needs a “ready” handshake GPIO, adds 2-way complexity

Why ZBus instead of shared globals + mutex?

Thread-safe by design — no manual locking in application code
Multiple threads can subscribe to same channel (packer + logger both read IMU)
Decouples sensor drivers from packer completely — swap out IMU driver without touching packer

Why nanopb instead of raw structs?

Endianness handled automatically
Versioning: add a field → old receivers ignore it safely
Self-framing: receiver doesn’t need to know struct layout out-of-band
Compact encoding: small numbers use fewer bytes (varint)

Why double buffering for SPI?

Jetson clocks data out immediately when CS goes low
CPU and DMA cannot safely write to the same buffer at the same time
Double buffer: DMA reads buffer A while CPU writes buffer B → zero race condition

Protobuf Schema (the shared contract)

// sensor_frame.proto — shared between STM32 firmare and Jetson software
syntax = "proto3";

message ImuData {
    float accel_x       = 1;
    float accel_y       = 2;
    float accel_z       = 3;
    float gyro_x        = 4;
    float gyro_y        = 5;
    float gyro_z        = 6;
    uint64 timestamp_us = 7;
}

message WheelData {
    float speed_fl      = 1;
    float speed_fr      = 2;
    float speed_rl      = 3;
    float speed_rr      = 4;
    uint64 timestamp_us = 5;
}

message SensorFrame {
    uint32 seq    = 1;
    ImuData imu   = 2;
    WheelData wheel = 3;
}

Field numbers (1, 2, 3…) are permanent — never reuse a retired number.

Wire Frame Format (SPI packet)

Byte 0:        0xAA          — sync byte (detect desync)
Byte 1-2:      length        — big-endian uint16, payload size
Byte 3..N+2:   nanopb payload — SensorFrame encoded bytes
Byte N+3..N+4: CRC16-CCITT   — integrity check

Total: 5 + payload bytes
Typical payload: ~120-200 bytes depending on how many fields present

The sync byte lets the Jetson detect if it missed a byte and needs to re-lock.