Exercises: Multi-Sensor ZBus + nanopb Bridge
Covers: Deep-dive session 09 — SensorFrame proto, ZBus multi-channel, length-prefix framing, 3 failure points
Section A — Conceptual Questions
A1. Your SensorFrame protobuf has three sub-messages: ImuData, EncoderData, GpsData.
During encoding with pb_encode(), which fields are omitted from the output bytes?
Give one example for each sub-message where a real sensor reading would be silently omitted.
Answer
Proto3 omits any field equal to its type's default: `0` for numbers, `false` for booleans,
`""` for strings, and absent for optional sub-messages with zero-value fields.
**ImuData:** `angular_velocity_z = 0.0` — robot is stationary, not rotating. Field omitted. The
Jetson decoder sees 0.0 (default), which is correct but indistinguishable from "encoder didn't
encode this". If the IMU driver had a bug returning 0 for z-axis, you'd never know from the wire.
**EncoderData:** `left_vel = 0` and `right_vel = 0` — robot is stopped. Both int16 fields omitted.
`timestamp_ms = 0` would also be omitted (though timestamp should never be 0 in practice — if it
is, check that the encoder timer is initialised before the first CAN frame is sent).
**GpsData:** `latitude = 0.0`, `longitude = 0.0` — robot is in BUS-OFF or GPS has no fix while
exactly on the equator/prime meridian (unlikely but theoretically valid). `has_fix = false` is
also omitted (bool default). The entire GpsData sub-message encodes to 0 bytes. The Jetson cannot
tell "no GPS" from "GPS locked at 0°, 0°, 0m".
**Rule:** Always include an explicit `has_fix` / `is_valid` field and ensure it is non-default
(i.e., `true`) for any sub-message you need to distinguish from "absent".
A2. Your SPI DMA frame is 130 bytes: 2-byte length prefix + up to 128 bytes of protobuf payload.
Explain what happens when pb_encode() produces 60 bytes for one SensorFrame but 110 bytes for
the next (GPS fix just acquired). How does the receiver know which is which?
Answer
The 2-byte big-endian length prefix tells the receiver exactly how many protobuf bytes to decode:
Frame N: [0x00][0x3C] [60 bytes of proto] [68 bytes of stale DMA buffer data]
Frame N+1: [0x00][0x6E] [110 bytes of proto][18 bytes of stale DMA buffer data]
The Jetson reads `len = (buf[0] << 8) | buf[1]` and then decodes only those bytes:
proto_bytes = frame[2 : 2 + length]
msg = SensorFrame()
msg.ParseFromString(proto_bytes)
The trailing stale bytes after `2 + length` are ignored. This is why the DMA buffer must be
**cleared to zero before each fill** (`memset(dma_buf, 0, 130)`), otherwise the stale length
prefix bytes from the previous frame could corrupt the next frame if `pb_encode()` produces fewer
than 2 prefix bytes (impossible in this design, but defensive).
**If you skip the length prefix** and always call `ParseFromString` on all 128 bytes: proto3
decodes the first valid protobuf fields, then hits the stale bytes from the previous frame.
Because proto3 is forward-compatible, it will attempt to parse the stale bytes as unknown fields
— silently producing a partially-corrupt message with no error.
A3. You have three ZBus channels: imu_chan, encoder_chan, gps_chan. The packer thread
subscribes to all three and packs them into one SensorFrame every 10ms. Explain the race condition
that can produce a SensorFrame where imu data is 9ms old but encoder data is 0.1ms old.
Answer
ZBus channels store only the **latest value** — there is no queue for values. The packer thread
wakes on `k_msem_take` (or `zbus_sub_wait`) when any channel is published. If:
1. IMU publishes at t=0ms.
2. Packer wakes and calls `zbus_chan_read(&imu_chan)` at t=0ms → 0ms old. ✓
3. Packer sleeps waiting for next wakeup.
4. Encoder publishes at t=9.9ms.
5. IMU publishes at t=9.95ms.
6. Encoder publishes at t=9.99ms.
7. Packer wakes (triggered by encoder at t=9.99ms).
8. Packer reads `imu_chan` → gets the t=9.95ms value (4ms old, not 9ms — close enough).
The scenario for 9ms-old IMU: the packer wakes on encoder publish at t=9ms. IMU has not yet
published in this cycle (IMU thread was preempted). Packer reads the IMU channel and gets the
t=0ms value (9ms stale). Then IMU publishes at t=10ms — too late.
**Mitigation:** Sample all channels immediately before the DMA buffer fill (not on wakeup), using
`K_NO_WAIT`. Accept that all readings are "as fresh as the last publish in the 10ms window".
Alternatively, use a dedicated 10ms periodic timer to trigger the pack, rather than waking on
channel events. Timestamps inside each sub-message reflect the actual measurement time, so the
Jetson can detect staleness regardless.
A4. pb_encode() returns false. Your code logs an error and re-sends the previous frame.
Name three distinct root causes that make pb_encode() return false, and the fix for each.
Answer
1. **Output buffer too small.** The `pb_ostream_t` is sized at 128 bytes but the encoded message
exceeds that. `pb_encode` writes until full then returns false.
Fix: increase buffer to the proto's `SensorFrame_size` compile-time constant (generated by
nanopb from the .proto), or use `pb_get_encoded_size()` first.
2. **Required field not initialised (proto2 behaviour — less common in proto3).** In nanopb,
if a field is marked `required` in the .proto and the struct field is not set, `pb_encode`
returns false. Proto3 has no required fields, but nanopb .proto files can mix versions.
Fix: audit the .proto — use proto3 syntax, all fields are optional.
3. **Custom encode callback returns false.** If you used a callback field (e.g. `bytes` field
with a custom encoder for the GPS NMEA raw string) and that callback returns false (e.g.
string is longer than allowed), the whole encode fails.
Fix: validate the data before passing to the callback, ensure the callback's output stream
has enough capacity.
Section B — Practical / Debug Scenarios
B1. Your Jetson receives SensorFrame messages and logs ParseFromString failed: invalid tag on
approximately every 50th frame. Other frames decode correctly. The STM32 pb_encode() always
returns true. What is the most likely cause?
Answer
**DMA cache coherency or SPI frame boundary corruption.** On approximately every 50th transfer,
the Cortex-M7 D-cache has not been flushed for the DMA output buffer. The Jetson receives
either:
- Stale bytes from a previous frame (cache not flushed before DMA start), or
- Partially-written bytes (cache line partially updated).
The `invalid tag` error means the Jetson's protobuf decoder hit a byte that doesn't correspond
to any valid field tag in `SensorFrame`. Proto3 field tags have the format `(field_number << 3) |
wire_type`. A stale byte from a previous frame's payload in the wrong position produces an
unrecognised tag.
**Diagnosis:** Log the raw bytes of a failing frame on the Jetson. Check if bytes after position
~60 match the payload of an earlier frame.
**Fix:** Ensure `SCB_CleanDCache_by_Addr(dma_buf, sizeof(dma_buf))` is called after writing the
proto payload and **before** starting the DMA transfer. Place `dma_buf` in a cache-coherent
region (`__nocache`) as an alternative.
B2. Your sensor_frame.proto defines GPS lat/lon as float (32-bit). You notice that at
53°N, 6°W your position jumps by ±10m randomly. You switch to double (64-bit) and it stops.
Explain why.
Answer
A `float32` has 24 bits of mantissa, giving ~7 significant decimal digits. At latitude 53°N:
- 53.36134° in float32 = `0x42558A71` ≈ 53.36134338...
The spacing between adjacent float32 values at 53° is:
- One ULP at 53 ≈ 53 × 2^(−23) × 2^6 ≈ 53 × 7.6e-6 ≈ **4.0e-4 degrees ≈ 44 metres**
So a `float32` lat/lon can only represent positions spaced ~44m apart at 53°N. Any sub-44m
position is rounded to the nearest representable float, causing the apparent 10m jumps.
`double` (64-bit, 53-bit mantissa) has ULP ≈ 53 × 1.4e-14 ≈ **1.5e-12 degrees ≈ 0.17 μm** —
sub-millimetre precision.
**Rule:** Always use `double` for GPS coordinates. `float` is sufficient for velocities,
accelerations, and small-range values. Check your `sensor_frame.proto` — nanopb generates `float`
for `.proto` `float` type, `double` for `.proto` `double` type.
Section C — Code Reading
C1. Find all bugs in this packer thread:
void packer_thread(void *a, void *b, void *c) {
struct sensor_frame_t frame;
uint8_t dma_buf[130];
SensorFrame proto_msg = SensorFrame_init_zero;
while (1) {
zbus_sub_wait(&packer_sub, &frame.imu_chan, K_FOREVER);
/* Read all channels */
zbus_chan_read(&imu_chan, &frame.imu, K_NO_WAIT);
zbus_chan_read(&encoder_chan, &frame.encoder, K_NO_WAIT);
zbus_chan_read(&gps_chan, &frame.gps, K_NO_WAIT);
/* Encode */
pb_ostream_t stream = pb_ostream_from_buffer(dma_buf + 2, 128);
fill_proto(&proto_msg, &frame);
pb_encode(&stream, SensorFrame_fields, &proto_msg);
/* Length prefix */
dma_buf[0] = (stream.bytes_written >> 8) & 0xFF;
dma_buf[1] = stream.bytes_written & 0xFF;
/* Trigger SPI DMA */
spi_write(spi_dev, &spi_cfg, &spi_buf_set);
}
}
Answer
**Four bugs:**
1. **`zbus_sub_wait` first argument is wrong.** `zbus_sub_wait` takes a `const struct
zbus_observer *` (the subscriber) and a `const struct zbus_channel **` (output, set to the
channel that triggered). The second argument `&frame.imu_chan` is a field of `sensor_frame_t`,
not a channel pointer. This should be:
```c
const struct zbus_channel *chan;
zbus_sub_wait(&packer_sub, &chan, K_FOREVER);
```
2. **`pb_encode()` return value ignored.** If encoding fails (buffer too small, callback error),
`stream.bytes_written` may be partially filled. The length prefix will be wrong, causing the
Jetson to try to parse a truncated or zero-length message. Add:
```c
if (!pb_encode(&stream, SensorFrame_fields, &proto_msg)) {
LOG_ERR("pb_encode failed: %s", PB_GET_ERROR(&stream));
continue;
}
```
3. **`proto_msg` is not reset between iterations.** `SensorFrame_init_zero` only runs once at
declaration. If GPS has a fix in frame N but loses it in frame N+1, the GPS fields from frame
N remain in `proto_msg` (proto3 does not zero them on partial encode). Add:
```c
proto_msg = (SensorFrame)SensorFrame_init_zero;
```
at the top of each loop iteration.
4. **No D-cache flush before `spi_write`.** `dma_buf` is in SRAM. The CPU writes proto bytes and
the length prefix via cache lines. Without `SCB_CleanDCache_by_Addr(dma_buf, 130)`, the DMA
controller reads stale bytes from main SRAM that the cache has not yet written back.
Insert the flush between the length prefix write and `spi_write`.