Prerequisite: 07-cascaded-loops.md (MCU cascade), 08-trajectory-tracking.md (Jetson path following)
Unlocks: 10-advanced-control.md (when this architecture isn’t enough)
The warehouse robot runs two computers that must cooperate in real-time:
| STM32 MCU | Jetson (Linux + ROS2) | |
|---|---|---|
| Clock | 168 MHz bare-metal / RTOS | 1.5 GHz Linux + preempt_RT |
| Latency | Deterministic (µs) | Best-effort (ms) |
| Role | Motor control, safety | Navigation, planning, perception |
| Rate | 10–20 kHz ISRs | 20–50 Hz ROS nodes |
| Failure mode | Watchdog reset | Process crash, OOM kill |
The SPI bus is the handshake between these two worlds. When the handshake breaks, the robot drifts, stops, or (worst case) drives into a rack.
Real AMR failures from architecture issues: - Jetson GC pause (50 ms) → no SPI update → MCU watchdog → emergency stop (Incident-C) - SPI frame corruption → MCU interprets noise as speed command → motor spike (Incident-A) - Clock drift between MCU and Jetson → cmd_vel timestamps misaligned → estimator rejects data
Jetson (SPI Master) STM32 (SPI Slave)
┌──────────────┐ ┌──────────────┐
│ │────── SCLK ────→│ │
│ │────── MOSI ────→│ SPI DMA │
│ │←───── MISO ─────│ │
│ │────── CS ────→│ │
│ SPI @ │ │ SPI @ │
│ 2 MHz │ │ 2 MHz │
└──────────────┘ └──────────────┘
AMR SPI parameters: - Clock: 2 MHz (500 ns per bit) - Frame size: 64 bytes (32 bytes each direction) - Transfer rate: 1 kHz (every 1 ms) - Protocol: nanopb (protocol buffers for embedded)
Jetson → MCU (command frame):
message MotorCommand {
uint32 timestamp_us = 1; // Jetson's microsecond clock
float left_speed_cmd = 2; // rad/s, at wheel
float right_speed_cmd = 3; // rad/s, at wheel
uint8 control_mode = 4; // 0=idle, 1=speed, 2=position, 3=raw_duty
uint8 enable = 5; // 0=disabled, 1=enabled
uint16 crc16 = 6; // CRC-16/CCITT
}
MCU → Jetson (feedback frame):
message MotorFeedback {
uint32 timestamp_us = 1; // MCU's microsecond clock
float left_speed = 2; // rad/s, measured
float right_speed = 3; // rad/s, measured
float left_current = 4; // Amps, measured
float right_current = 5; // Amps, measured
int32 left_encoder = 6; // Cumulative encoder count
int32 right_encoder = 7; // Cumulative encoder count
uint8 fault_flags = 8; // Overcurrent, overtemp, encoder error
uint16 crc16 = 9;
}
Every frame includes a CRC-16 checksum. If the CRC doesn’t match, the MCU ignores the command and holds the last valid command (for up to the watchdog timeout).
// CRC-16/CCITT (polynomial 0x1021)
uint16_t crc16_ccitt(const uint8_t *data, size_t len) {
uint16_t crc = 0xFFFF;
for (size_t i = 0; i < len; i++) {
crc ^= (uint16_t)data[i] << 8;
for (int j = 0; j < 8; j++) {
if (crc & 0x8000)
crc = (crc << 1) ^ 0x1021;
else
crc <<= 1;
}
}
return crc;
}
AMR failure rate: Approximately 1 in 10,000 SPI frames is corrupted (0.01%), usually from EMI near the motor drivers. The CRC catches all of these.
1 ms SPI cycle
├──────────────────────────────────┤
Jetson: │ Prepare cmd ─→ SPI write ─→ Wait │
│ 100 µs 200 µs 700 µs│
MCU: │ SPI DMA recv ─→ Validate ─→ Apply │
│ 200 µs 10 µs 10 µs │
│ ↓ │
│ Speed PID runs at 10 kHz │
│ (every 100 µs within) │
Key contract: The Jetson must deliver a new command every 1 ms ± 0.2 ms. If a command doesn’t arrive within 2 ms (2 missed cycles), the MCU enters hold mode (maintains last speed command). If no command arrives within 10 ms (10 missed cycles), the MCU enters emergency deceleration.
Timer-based watchdog
┌─────────────────┐
│ Counter = 10 │
│ Decrement @ │
│ 1 kHz │
│ │
SPI valid frame ──→ Reset to 10 │
│ │
Counter = 5 ─────→ HOLD mode │ (reduce speed by 50%)
Counter = 0 ─────→ E-STOP │ (brake motors)
└─────────────────┘
// MCU watchdog for SPI communication
#define WDG_RELOAD 10 // 10 ms timeout
#define WDG_HOLD_LEVEL 5 // 5 ms → enter hold
volatile uint8_t spi_watchdog_counter = 0;
// Called in SPI RX complete callback
void spi_frame_received_ok(void) {
spi_watchdog_counter = WDG_RELOAD;
}
// Called at 1 kHz by timer
void watchdog_tick(void) {
if (spi_watchdog_counter > 0) {
spi_watchdog_counter--;
}
if (spi_watchdog_counter == 0) {
// E-STOP: brake motors immediately
motor_brake();
set_fault_flag(FAULT_COMM_LOST);
} else if (spi_watchdog_counter <= WDG_HOLD_LEVEL) {
// HOLD: reduce speed command by 50%
speed_setpoint_left /= 2;
speed_setpoint_right /= 2;
}
}
The ROS2 node that sends SPI commands must respect the timing contract:
import rclpy
from rclpy.node import Node
import spidev
import time
class MotorBridge(Node):
def __init__(self):
super().__init__('motor_bridge')
self.spi = spidev.SpiDev()
self.spi.open(0, 0)
self.spi.max_speed_hz = 2000000
# Timer at 1 kHz (1 ms period)
self.timer = self.create_timer(0.001, self.spi_callback)
self.last_cmd_vel = (0.0, 0.0) # (v, omega)
self.cmd_vel_stamp = self.get_clock().now()
def cmd_vel_callback(self, msg):
self.last_cmd_vel = (msg.linear.x, msg.angular.z)
self.cmd_vel_stamp = self.get_clock().now()
def spi_callback(self):
# Check cmd_vel freshness
age = (self.get_clock().now() - self.cmd_vel_stamp).nanoseconds / 1e9
if age > 0.2: # 200 ms stale → stop
v, omega = 0.0, 0.0
else:
v, omega = self.last_cmd_vel
# Inverse kinematics
wL = (v - omega * WHEELBASE / 2) / WHEEL_RADIUS
wR = (v + omega * WHEELBASE / 2) / WHEEL_RADIUS
# Build command frame
frame = build_motor_command(wL, wR)
# SPI transfer
response = self.spi.xfer2(frame)
# Parse feedback
feedback = parse_motor_feedback(response)
self.publish_odom(feedback)
From Nav2 decision to motor torque change:
| Stage | Latency | Source |
|---|---|---|
| Nav2 controller → cmd_vel publish | 0–20 ms | ROS2 DDS, timer jitter |
| cmd_vel → motor_bridge callback | 0–5 ms | ROS2 intra-process: ~0.1 ms, inter-process: ~5 ms |
| Motor bridge processing | 0.1 ms | Inverse kinematics |
| SPI transfer | 0.2 ms | 64 bytes @ 2 MHz |
| MCU validates + applies | 0.01 ms | CRC check + register write |
| Next speed PID cycle | 0–0.1 ms | Waits for next 10 kHz tick |
| Current PID → PWM update | 0.05 ms | One current loop cycle |
| Motor electrical response | 0.5 ms | Electrical time constant |
| Total (typical) | ~5 ms | |
| Total (worst case) | ~30 ms |
A 30 ms delay at 0.5 m/s means the robot travels 15 mm before responding to a command change. For corridor driving, this is fine. For docking (±5 mm precision), this is borderline.
Control theory perspective: A time delay $\tau$ in the loop adds phase lag:
$$\angle G(j\omega) = -\omega \tau \text{ radians}$$
At the speed loop’s 50 Hz bandwidth ($\omega = 100\pi$): $-100\pi \times 0.030 = -9.4°$ of extra phase lag.
The phase margin typically has ~40° to spare, so 9.4° is acceptable. But at 100 Hz bandwidth, the lag would be 18.8° — getting tight.
robot-specific: If the Jetson’s Nav2 controller runs at 20 Hz and the motor bridge runs at 1 kHz, there are ~50 SPI cycles between each new cmd_vel. The motor bridge sends the same cmd_vel 50 times.
If Nav2 misses a cycle (garbage collection, CPU spike), there’s a 100 ms gap in cmd_vel updates. The velocity smoother helps, but the MCU sees stale commands.
Worse: If the Nav2 controller’s timer fires late, the cmd_vel can be timestamped incorrectly, causing the estimator to compute wrong odometry deltas.
The Jetson’s clock and the MCU’s clock are independent. Over time, they drift: - Jetson: NTP-synced, typical drift ~1 ms/hour - MCU: Crystal oscillator, drift ~50 ppm = 50 µs/second = 180 ms/hour
After 1 hour, the clocks can be 200 ms apart. If the Jetson stamps a command at time $t$ but the MCU thinks the current time is $t + 200ms$, the estimator’s time-interpolation breaks.
Every SPI frame includes timestamps from both sides. The Jetson computes the offset:
class ClockSync:
def __init__(self):
self.offset_us = 0
self.alpha = 0.01 # Low-pass filter coefficient
def update(self, jetson_send_time_us, mcu_timestamp_us, jetson_recv_time_us):
"""Estimate MCU clock offset using SPI round-trip."""
# Round-trip time
rtt = jetson_recv_time_us - jetson_send_time_us
# Assume symmetric delay: MCU received at send_time + rtt/2
estimated_mcu_time_at_midpoint = jetson_send_time_us + rtt // 2
# Offset = Jetson time - MCU time
raw_offset = estimated_mcu_time_at_midpoint - mcu_timestamp_us
# Exponential moving average to filter noise
self.offset_us = int(self.alpha * raw_offset + (1 - self.alpha) * self.offset_us)
def mcu_to_jetson_time(self, mcu_time_us):
return mcu_time_us + self.offset_us
| Failure | Detection | MCU response | Jetson response |
|---|---|---|---|
| SPI frame corrupt | CRC mismatch | Hold last valid command | Log warning, retry |
| SPI completely lost | Watchdog timeout (10 ms) | Emergency brake | Publish /motor_fault |
| Jetson frozen/crashed | No SPI writes | Watchdog → brake | (dead) Systemd restarts |
| MCU frozen | No SPI response | (dead) Hardware WDG resets | Detect stale feedback → stop |
| Motor overcurrent | Shunt ADC > threshold | Immediate PWM disable | Publish fault diagnostic |
| Encoder failure | Speed estimate unreasonable | Reduce to open-loop + brake | Log, request service |
MCU rule: Never trust that Jetson commands are sane. Always validate: - Speed command within ±$v_{max}$? If not, clamp. - CRC valid? If not, use last good command. - Command fresh? If not, decelerate.
Jetson rule: Never trust that MCU feedback is accurate. Always validate: - Speed physically possible? (> 2× max speed = encoder error) - Current within range? (negative current = sensor failure) - Encoder monotonically consistent with speed sign?
// MCU-side command validation
bool validate_motor_command(MotorCommand *cmd) {
// Range check
if (fabsf(cmd->left_speed_cmd) > MAX_WHEEL_SPEED) return false;
if (fabsf(cmd->right_speed_cmd) > MAX_WHEEL_SPEED) return false;
// Mode check
if (cmd->control_mode > 3) return false;
// Rate-of-change check (no impossible acceleration)
float dv_left = fabsf(cmd->left_speed_cmd - prev_cmd.left_speed_cmd);
if (dv_left > MAX_ACCEL * SPI_PERIOD) return false;
return true;
}
When things go wrong, degrade smoothly:
NORMAL → REDUCED → HOLD → STOP → BRAKE
↑ │
└─────── Recovery ────────────────┘
NORMAL: Full speed, full control
REDUCED: 50% max speed, increased monitoring
HOLD: Maintain last velocity, start decelerating
STOP: Zero velocity command, PID still active
BRAKE: PWM off, motor shorted (dynamic braking), no PID