Exercises: Jetson Orin NX — RT Setup + SPI Baseline

Covers: Deep-dive session 10 — nvpmodel, jetson_clocks, isolcpus, cyclictest, spidev loopback, ioctl timing

Section A — Conceptual Questions

A1. After flashing JetPack 5.x, what three commands do you run in order to maximise CPU performance before benchmarking? Explain what each one does at the hardware level.

Answer

sudo nvpmodel -m 0          # 1. Set power mode
sudo jetson_clocks          # 2. Lock all clocks at max
sudo systemctl disable irqbalance && sudo systemctl stop irqbalance  # 3. Freeze IRQ affinity

**1. `nvpmodel -m 0`** — selects the MAX power envelope. Allows all CPU cores to be active at maximum frequency. Lower modes (1, 2, etc.) cap TDP by reducing core count or frequency, which adds governor-driven frequency scaling that introduces latency jitter. **2. `jetson_clocks`** — pins GPU, CPU, EMC (memory bus), and DLA clocks at their maximum advertised frequency. Without this, the cpufreq governor (typically `schedutil`) scales clocks down during idle periods. When your RT thread wakes, it may run at reduced frequency for the first few microseconds while the PLL ramps up → unpredictable worst-case latency spike. **3. `systemctl disable irqbalance`** — the `irqbalance` daemon periodically migrates hardware interrupts between CPU cores to balance load. If it migrates your SPI IRQ away from the isolated core mid-transfer, the interrupt handler runs on a shared core, adding scheduler contention latency. Disabling it locks IRQ affinity permanently to the kernel's default assignment.

A2. Explain the purpose of isolcpus=3 nohz_full=3 rcu_nocbs=3 in /boot/extlinux/extlinux.conf. What is “tick noise” and how does nohz_full eliminate it? What is an RCU callback and why does rcu_nocbs matter for a 100Hz RT thread?

Answer

These three kernel command-line parameters together create a **dedicated isolated CPU core** (core 3). **`isolcpus=3`:** Removes CPU 3 from the Linux scheduler's runqueue. No ordinary (non-RT) processes will be scheduled there. Only threads that explicitly call `pthread_setaffinity_np()` or `sched_setaffinity()` to pin themselves to core 3 will run on it. **`nohz_full=3`:** Suppresses the periodic scheduler tick (normally 250Hz or HZ=1000) on core 3 when there is exactly one runnable task. The tick is a timer interrupt that wakes the kernel every 1-4ms regardless of workload. On a 100Hz thread (10ms period), one tick interrupt per period is ~1-10% additional interrupt latency overhead at worst timing. `nohz_full` eliminates this → the isolated core runs interrupt-free between hardware events. **`rcu_nocbs=3`:** Read-Copy-Update (RCU) is the kernel's lock-free synchronisation mechanism. Periodically, the kernel runs RCU callbacks (deferred memory frees, list updates) on each core. Without `rcu_nocbs`, these callbacks run on core 3 in your RT thread's context window, adding unpredictable 10-100μs jitter. `rcu_nocbs=3` offloads all RCU callbacks from core 3 to a dedicated kthread on another core. Combined: core 3 has **no scheduler ticks, no RCU callbacks, and no involuntary process migrations** — it is as close to bare-metal as Linux gets.

A3. You run cyclictest --mlockall -t1 -p99 -i 1000 -n -a 3 --duration=30s and get:

T: 0 ( 1234) P:99 I:1000 C:  30000 Min:    8 Act:   11 Avg:   12 Max:   847

Interpret every field. Is this result acceptable for a 100Hz (10ms period) SPI bridge? What would make it unacceptable?

Answer

- `T: 0 ( 1234)` — Thread 0, PID 1234. - `P:99` — Running at SCHED_FIFO priority 99 (highest real-time priority). - `I:1000` — Timer interval 1000 µs (1ms). - `C: 30000` — 30,000 measurement cycles completed (30,000 × 1ms = 30s). - `Min: 8` — Minimum latency observed: 8 µs (best case, core idle, immediate wakeup). - `Act: 11` — Most recent measurement: 11 µs. - `Avg: 12` — Average latency: 12 µs. - `Max: 847` — **Worst-case latency: 847 µs** (nearly 1ms). **Acceptability for 100Hz SPI bridge (10ms period):** The 847 µs worst case is within the 10ms period — so the thread always meets its deadline, but only with 9.15ms of margin. This is **acceptable** for a sensor bridge where the protocol tolerates sub-millisecond jitter. It would be **unacceptable** if: - The period were shorter (e.g. 1ms, 500Hz), where 847µs jitter would cause missed deadlines. - You observe Max > 1000 µs — this indicates a non-RT interrupt or process stole the core. - You observe Max growing over time (run for 5+ minutes) — suggests a periodic kernel event (RCU grace period, memory compaction) that `rcu_nocbs` didn't fully mitigate. Target for production: Max < 100 µs (good), < 500 µs (acceptable), > 1000 µs (investigate).

A4. The Jetson spidev physical setup requires a jetson-io.py step before spidev_test works. What does jetson-io.py configure, and why can’t you just modprobe spidev and expect /dev/spidev0.0 to appear?

Answer

`jetson-io.py` configures the **pin multiplexer (pinmux)** — the hardware register that determines whether each GPIO pad is connected to SPI, I2C, UART, or plain GPIO function. On the Jetson Orin NX, the SPI controller exists in silicon, but by default the pins that would be routed to SPI are configured as GPIO or another function. `modprobe spidev` loads the kernel driver, which creates the `/sys/bus/spi/` device tree. But if the pinmux has not been set to route the SPI signals to the 40-pin header pads, the SPI controller is connected to nothing — there are no rising/falling edges on any physical pin. `jetson-io.py` writes the pinmux configuration to the device tree overlay and saves it persistently. After reboot, the pins are committed to SPI function, the kernel's device tree enumerates the SPI controller, `spidev` binds to it, and `/dev/spidev0.0` appears. **Shortcut diagnostic:** After `jetson-io.py` + reboot, run:

ls /dev/spidev*
# Should show: /dev/spidev0.0  /dev/spidev0.1

If nothing appears, check `dmesg | grep spi` — the driver bound but no device node means the udev rule is missing or the device tree overlay didn't apply.

A5. You measure ioctl latency for SPI_IOC_MESSAGE(1) on Jetson Orin NX and get p99 = 4.8ms. Your target is <2ms. Name two hardware causes and two software causes, and for each, the diagnostic command or change you would try.

Answer

**Hardware causes:** 1. **CPU not locked at max frequency.** The cpufreq governor scaled down core 3 after the first measurement. The SPI transfer starts slowly while the PLL ramps. Diagnostic: `cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq` — should match `cpuinfo_max_freq`. Fix: re-run `sudo jetson_clocks` and keep it running as a service. 2. **SPI clock rate too low.** At low SPI clock (1MHz), 130 bytes takes 1.04ms + interrupt overhead. Raising to 10MHz drops transfer time to 104µs. Diagnostic: Check `spi_ioc_transfer.speed_hz` in your code. Test with `spidev_test -D /dev/spidev0.0 -s 10000000 -p "DEADBEEF..."`. **Software causes:** 3. **Thread not pinned to isolated core.** Without `pthread_setaffinity_np` + `SCHED_FIFO`, the ioctl call may migrate to a busy core mid-execution. Diagnostic: `ps -eLo pid,psr,cls,pri,cmd | grep your_process` — PSR shows current core. Fix: call `set_cpu_affinity(3)` and `set_sched_fifo(90)` before the main loop. 4. **Measurement includes Python/ctypes overhead.** If you're measuring with Python `time.perf_counter()` around `fcntl.ioctl()`, Python's global interpreter lock (GIL) adds unpredictable overhead. Fix: measure from C using `clock_gettime(CLOCK_MONOTONIC_RAW)` immediately before and after the ioctl syscall — inside the same C function, not from Python.

Section B — Practical / Debug Scenarios

B1. spidev_test -D /dev/spidev0.0 -p "DEADBEEF" returns data but it is all zeros, not the loopback echo you expected. The MISO and MOSI pins are confirmed connected with a jumper wire. What is the most likely cause?

Answer

**SPI mode mismatch (CPOL/CPHA).** The default `spidev_test` uses SPI mode 0 (CPOL=0, CPHA=0). If the jumper wire loopback is working correctly (MISO=MOSI), you should get an echo. All-zeros suggests data is not being sampled at the right clock edge. Secondary possibility: **the `D/R` receive-enable pin on the SN65HVD230 transceiver** (if the transceiver is in the signal path — but for a bare loopback with jumper, this doesn't apply). For pure loopback test (no STM32 connected):

# Force mode 0, speed 1MHz, explicit loopback
spidev_test -D /dev/spidev0.0 -s 1000000 -p "HELLO" -v

If still all zeros: check `dmesg | grep spi` for `spi_imx: unhandled interrupt` or FIFO errors. If there are no errors: the spidev device was opened but the pinmux is still not applied — `/dev/spidev0.0` exists but the pins are routing to GPIO, not SPI hardware. Repeat `jetson-io.py` and reboot.

B2. cyclictest is running on core 3 with isolcpus=3 set, but Max latency is still 1200 µs after 60 seconds. dmesg shows no errors. What are three things to check?

Answer

1. **`irqbalance` is still running.** Even after `systemctl disable`, if it was not stopped for the current session: `systemctl stop irqbalance`. Verify: `ps aux | grep irqbalance`. 2. **A non-RT interrupt is routed to core 3.** Check `/proc/interrupts` — look for any IRQ line with a non-zero count in the "CPU3" column. Common culprits: SPI1, I2C, ethernet. Migrate them away with `echo 7 > /proc/irq//smp_affinity` (bitmask for cores 0,1,2 only). 3. **`rcu_nocbs` is not applied.** Verify the kernel was actually booted with the parameter: `cat /proc/cmdline | grep rcu_nocbs`. If absent, the overlay change in `extlinux.conf` didn't take effect — check for a syntax error or that the correct `extlinux.conf` was modified (the Jetson has multiple and uses the one specified by the bootloader chainloading).