Prerequisites: Chapter 01 (basic voltage, pull-ups) Unlocks: Debug consoles, GPS modules, RS-485 industrial sensors, DMA patterns
printk() and Zephyr shell output goes through UARTParallel: Send 8 bits simultaneously on 8 wires. Fast per-clock, but: - 8 wires + clock + ground = 10+ wires - At high speed, wires have slightly different propagation delays → bits arrive at different times (skew) - Skew limits maximum clock rate to ~100MHz for short distances
Serial: Send 1 bit at a time on 1 wire. Fewer wires, no skew problem. - Modern serial can run at GHz speeds (USB 3: 5Gbps, PCIe: 16Gbps) - Won the “war” for almost everything except memory buses (DDR) and internal chip buses
PARALLEL (old printer port, DDR memory):
Data[0] ────────────────────
Data[1] ────────────────────
Data[2] ────────────────────
Data[3] ────────────────────
Data[4] ────────────────────
Data[5] ────────────────────
Data[6] ────────────────────
Data[7] ────────────────────
Clock ────────────────────
GND ────────────────────
10+ wires, all bits arrive "simultaneously" (but skew problems at speed)
SERIAL (UART, SPI, USB):
Data ────────────────────
(Clock ──────────────────── ) ← optional, depends on protocol
GND ────────────────────
2-4 wires total. Bits arrive one at a time. Simple, reliable.
| Type | Clock Wire? | Examples | How Receiver Knows When to Sample |
|---|---|---|---|
| Synchronous | Yes (separate clock) | SPI, I2C | Samples on clock edges |
| Asynchronous | No clock wire | UART | Both sides agree on timing (baud rate) beforehand |
UART is asynchronous — there’s no shared clock. Both sides must be configured to the same baud rate. If they disagree by more than ~2%, data gets corrupted.
Idle / Logic 1: 3.3V (or 5V for 5V logic)
Active / Logic 0: 0V
This is "active low" for the start bit:
the line sits HIGH when idle, drops LOW to signal "data coming"
Idle / Logic 1: -3V to -15V (typically -12V)
Active / Logic 0: +3V to +15V (typically +12V)
INVERTED compared to TTL! And much higher voltages!
You CANNOT directly connect RS-232 to a 3.3V MCU. It will destroy the input pin.
The MAX3232 level shifter converts between RS-232 (±12V) and CMOS (0/3.3V). It uses a charge pump with external capacitors to generate ±8V from 3.3V.
RS-232 device ──── MAX3232 ──── STM32 UART pins
(±12V) (level (0V / 3.3V)
shifter)
Nowadays, RS-232 is rare. Most “serial” connections are TTL UART through a USB-to-UART bridge (CP2102, CH340, FTDI FT232R). These ICs appear as a virtual COM port on your computer.
One UART frame (8N1 = 8 data bits, No parity, 1 stop bit):
┌─────┬────┬────┬────┬────┬────┬────┬────┬────┬──────┐
│START│ D0 │ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ D7 │ STOP │
│ 0 │LSB │ │ │ │ │ │ │MSB │ 1 │
└─────┴────┴────┴────┴────┴────┴────┴────┴────┴──────┘
↑ ↑
Falling edge Returns to
triggers receiver idle (HIGH)
to start sampling
Step by step:
Total frame size: 1 (start) + 8 (data) + 0 (no parity) + 1 (stop) = 10 bit-times per byte
The character ‘U’ is 0x55 = 0101_0101 in binary.
Baud rate: 115200 bits/second → 1 bit-time = 1/115200 = 8.68µs
Binary of 0x55: 0 1 0 1 0 1 0 1
Transmitted LSB first: D0=1, D1=0, D2=1, D3=0, D4=1, D5=0, D6=1, D7=0
Voltage on TX line:
3.3V ┤‾‾‾‾‾│ │‾‾‾‾│ │‾‾‾‾│ │‾‾‾‾│ │‾‾‾‾‾‾‾
│ │ │ │ │ │ │ │ │
0V ┤ │‾‾‾‾│ │‾‾‾‾│ │‾‾‾‾│ │‾‾‾‾│
│
│START│ D0 │ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ D7 │STOP│
│ 0 │ 1 │ 0 │ 1 │ 0 │ 1 │ 0 │ 1 │ 0 │ 1 │
0 8.7 17 26 35 43 52 61 69 78 87 µs
Wait — let me redo the diagram. 0x55 = bits [0]=1,[1]=0,[2]=1,[3]=0,[4]=1,[5]=0,[6]=1,[7]=0:
Voltage:
___ ___ ___ ___
3.3V ‾‾‾| |‾‾| |‾‾| |‾‾| |‾‾| |‾‾‾‾‾‾‾
| | | | | | | | | |
0V |___| |___| |___| |___| |___|
S D0 D1 D2 D3 D4 D5 D6 D7 STOP
0 1 0 1 0 1 0 1 0 1
S = Start bit (always 0 = LOW)
STOP = Stop bit (always 1 = HIGH)
D0-D7 = data LSB first: 1,0,1,0,1,0,1,0 = 0x55
Total time = 10 × 8.68µs = 86.8µs per byte
Why ‘U’ (0x55) is special: It creates a perfect alternating 0-1 pattern on the wire. This is why auto-baud detection circuits often ask you to send ‘U’ first — they measure the bit-time from the alternating edges.
UART transmits the least significant bit first. This seems backwards, but it has a practical reason: the receiver can start processing the data before all bits arrive. For example, “is the number odd?” can be answered after the first bit.
But beware: SPI is typically MSB first (configurable). When writing protocol bridges between UART and SPI devices, bit order differences can produce mystifying byte value swaps.
Baud rate = number of bit-times per second. At 115200 baud, each bit occupies 1/115200 = 8.68µs.
The receiver uses the falling edge of the start bit to synchronize, then samples in the MIDDLE of each subsequent bit-time:
TX line: __|‾‾‾‾|____|‾‾‾‾|____|‾‾‾‾|...
↑
Start bit
falling edge
Receiver samples: (waits 1.5 bit-times from start edge)
↓ ↓ ↓ ↓
Sampling points: __|‾‾‾‾|____|‾‾‾‾|____|...
↑ ↑ ↑ ↑
D0 D1 D2 D3 ...
(sample in the middle of each bit)
Key insight: The receiver re-synchronizes on EVERY start bit. It does NOT maintain a free-running clock. This means baud rate errors accumulate within one frame (10 bits) but reset at the next byte.
The receiver samples in the middle of each bit. If the clocks disagree, the sampling point drifts. After enough bits, it drifts past the edge of a bit-time → wrong bit sampled → corruption.
BAUD RATE MISMATCH:
TX at 115200: | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 |
RX at 112000: | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 X|
↑
Sample drifts into
adjacent bit → error!
Calculating when corruption starts:
Mismatch per bit = |f_tx - f_rx| / f_tx
At 3% mismatch: error per bit = 0.03 × 1 bit-time = 0.03 bit-times
After N bits: total drift = N × 0.03
Corruption at: drift > 0.5 (sampling past the bit boundary)
N = 0.5 / 0.03 = 16.7 bits
A 10-bit frame (8N1) → safe (drift = 10 × 0.03 = 0.3 bit-times)
A 20-byte message = 200 bits → BUT! Re-sync at each byte start.
So: per-byte drift = 10 × 0.03 = 0.3 bit-times → marginal but works.
At 5% mismatch:
Per-byte drift = 10 × 0.05 = 0.5 → right at the edge → unreliable!
Rule of thumb: Keep baud rate error below ±2% for reliable communication. Most MCU UARTs derive baud from their clock: if the clock crystal is ±0.01%, this is trivially met. Problems arise with: - RC oscillators (±2-5% on some cheap MCUs) - USB-UART bridges with non-standard baud rates
| Baud Rate | Bit Time | Bytes/sec | Use Case |
|---|---|---|---|
| 9600 | 104µs | 960 | Slow sensors, GPS NMEA, legacy |
| 115200 | 8.68µs | 11,520 | Debug console, common default |
| 921600 | 1.09µs | 92,160 | High-speed embedded |
| 1,000,000 (1M) | 1.00µs | 100,000 | Maximal standard UART |
| 3,000,000 (3M) | 0.33µs | 300,000 | STM32 LPUART, some bridges |
Throughput is not baud rate! At 115200 baud with 8N1 framing:
Effective data rate = 115200 / 10 bits per byte = 11,520 bytes/sec
The 2 overhead bits (start + stop) cost you 20% of the bandwidth.
┌─────────────────────────────────────────────────┐
│ UART Peripheral │
│ │
│ TX Path: │
│ CPU writes byte → [TX Holding Reg] → [TX Shift Reg] → TX pin
│ ↑ clocked by baud generator
│ │
│ RX Path: │
│ RX pin → [RX Shift Reg] → [RX Holding Reg] → CPU reads byte
│ ↑ clocked by 16× baud generator │
│ (samples 16 times per bit-time, │
│ takes the middle sample) │
│ │
│ FIFO (optional): │
│ [TX FIFO] → multiple bytes buffered for TX │
│ [RX FIFO] → multiple bytes buffered for RX │
│ STM32: FIFO depth varies by UART instance │
│ │
│ Interrupts: TXE (TX empty), RXNE (RX not empty), │
│ IDLE (line idle after last byte), │
│ ORE (overrun error — RX not read │
│ before next byte arrived) │
└─────────────────────────────────────────────────┘
The 16× oversampling: The UART peripheral doesn’t just sample once per bit. It samples 16 times per bit-time and uses a majority vote from the middle 3 samples. This improves noise immunity.
Two extra wires: - RTS (Request to Send): “I’m ready to receive data” — asserted by receiver - CTS (Clear to Send): “You may send to me” — the other side’s RTS, wired to my CTS input
Device A Device B
TX ──────────────────────────────── RX
RX ──────────────────────────────── TX
RTS ─────────────────────────────── CTS
CTS ─────────────────────────────── RTS
GND ─────────────────────────────── GND
When Device B's RX buffer is almost full:
B de-asserts RTS → A sees CTS de-asserted → A stops sending
When B processes data and buffer has space:
B re-asserts RTS → A sees CTS asserted → A resumes
When it’s needed: High-speed UART (≥921600) with bursts of data that can exceed buffer size. At 115200 for debug logging, usually not needed.
The receiver sends special bytes in-band: - XOFF (0x13): “Stop sending!” — receiver’s buffer is full - XON (0x11): “Resume sending!” — receiver has space
Why it fails for binary data: If your data happens to contain 0x13, the transmitter interprets it as flow control and stops sending. Never use XON/XOFF for binary protocols. Use hardware flow control or a framing protocol that escapes special bytes.
Without DMA (interrupt per byte):
At 1,000,000 baud (1Mbaud), 8N1:
Bytes/sec = 100,000
→ 100,000 interrupts per second
→ Each interrupt takes ~1µs overhead (save registers, branch, restore)
→ 100,000 × 1µs = 100ms/sec = 10% CPU just handling UART interrupts!
With DMA (interrupt per buffer):
DMA transfers bytes directly from UART RX register to RAM buffer.
CPU is interrupted only when:
1. Buffer is full (e.g., every 256 bytes → 390 interrupts/sec)
2. IDLE line detected (no more bytes coming → grab partial buffer)
CPU overhead: ~0.04% instead of 10%. The DMA controller does the work.
For variable-length messages, you can’t wait for a full buffer. The idle-line interrupt fires when the RX line has been HIGH (idle) for one full frame time after receiving data.
RX data: [byte][byte][byte][byte]...[last byte]___IDLE___
↑
Idle-line interrupt fires
"Message is complete"
DMA has already copied all bytes to RAM buffer.
Interrupt handler reads how many bytes DMA transferred.
/* Zephyr DMA UART pattern (conceptual): */
const struct device *uart = DEVICE_DT_GET(DT_NODELABEL(usart3));
uint8_t rx_buf[256];
/* Configure UART for async (DMA) mode */
uart_callback_set(uart, uart_callback, NULL);
uart_rx_enable(uart, rx_buf, sizeof(rx_buf), 100 /* timeout_us */);
/* Callback fires when: buffer full, timeout, or error */
void uart_callback(const struct device *dev, struct uart_event *evt, void *data) {
if (evt->type == UART_RX_RDY) {
/* evt->data.rx.buf + evt->data.rx.offset → received data */
/* evt->data.rx.len → number of bytes */
process_data(evt->data.rx.buf + evt->data.rx.offset, evt->data.rx.len);
}
}
UART (TTL) is point-to-point and limited to ~1 meter. RS-485 adds: - Differential signaling: Two wires (A, B), measures voltage difference → noise rejection - Multi-drop: Up to 32 devices on the same two wires - Long distance: Up to 1200 meters at low baud rates - Half-duplex: Only one device transmits at a time (they share the bus)
RS-485 bus topology:
┌────┐ ┌────┐ ┌────┐ ┌────┐
│Dev1│ │Dev2│ │Dev3│ │Dev4│
└─┬──┘ └─┬──┘ └─┬──┘ └─┬──┘
│ │ │ │
──┼────────┼────────┼────────┼── Wire A
──┼────────┼────────┼────────┼── Wire B
│ │
┌┴┐ 120Ω ┌┴┐ 120Ω ← termination at BOTH ends
└┬┘ └┬┘
A──B A──B
MCU RS-485 Transceiver Bus
TX ──────── DI A ────────
RX ──────── RO B ────────
GPIO ────── DE/RE̅ GND ──────
DE (Driver Enable): HIGH = transmitting, LOW = receiving
RE̅ (Receiver Enable): LOW = receiving, HIGH = disabled
Tie DE and RE̅ together → one GPIO controls direction:
GPIO HIGH → transmit mode
GPIO LOW → receive mode
Critical timing: After transmitting the last byte, you must wait until the byte has fully shifted out the TX register before switching DE LOW (back to receive mode). If you switch too early, the last byte gets truncated. Use the UART TC (Transmit Complete) interrupt, not TXE (TX Empty — buffer empty, but byte still shifting out).
The CP2102, CH340, and FTDI FT232R are ICs that present as a USB device to the host computer and translate USB data to/from UART signals. They contain: - USB transceiver - Internal clock synthesis (no external crystal on CP2102) - UART engine with configurable baud rate - Optional RTS/CTS/DTR lines
From the MCU’s perspective, it’s just talking UART. From the PC side, it’s a virtual COM port.
PC ──[USB cable]── CP2102 ──[TX/RX wires]── STM32 UART
On PC: /dev/ttyUSB0 (Linux) or COM3 (Windows)
Terminal: minicom, picocom, PuTTY, or VS Code Serial Monitor
Human-readable. Each value is transmitted as ASCII characters:
Sending the number 1234:
Bytes on wire: '1' '2' '3' '4' '\r' '\n'
Hex: 0x31 0x32 0x33 0x34 0x0D 0x0A
6 bytes for a 16-bit number! Inefficient but easy to debug with a terminal.
Examples: GPS NMEA ($GPGGA,123519,4807.038,N,...), AT commands (AT+OK\r\n), Zephyr shell.
Values transmitted as raw bytes:
Sending the number 1234 (0x04D2) in binary:
Bytes on wire: 0x04 0xD2
2 bytes! But if you look at it in a terminal, you see garbage characters.
The mixing bug: If you send binary data through a system that interprets special ASCII characters: - 0x0A in your binary data = ‘\n’ → line-ending handling may eat it or add 0x0D - 0x13 in your binary data = XOFF → software flow control stops transmission - 0x00 in your binary data = null → C string functions terminate early
Fix: Use a proper framing protocol (header + length + payload + CRC) and disable any text-mode processing on the UART.
| Symptom | Likely Cause | How to Diagnose | Fix |
|---|---|---|---|
| Garbage characters on terminal | Wrong baud rate | Try all common baud rates. Logic analyzer shows correct timing | Match baud rate (check both sides!) |
| Nothing received | TX and RX swapped | TX of device A must go to RX of device B, and vice versa | Swap TX/RX wires |
| Characters work sometimes | Baud rate mismatch >2% | Count error rate over 1000 bytes. Use scope to measure actual bit time | Fix clock source or baud divisor |
| Corruption in long messages | No flow control, RX buffer overflow | ORE (overrun) flag in UART status reg | Enable RTS/CTS or increase buffer/use DMA |
| Works on bench but not in field | Missing GND wire | Measure voltage between device GNDs — should be <0.5V | Connect GND between devices |
| ±12V on MCU pin → dead pin | RS-232 connected to TTL UART | Measure voltage at MCU pin with scope | Add MAX3232 level shifter |
| RS-485 bus unreliable | Missing termination resistors | Scope A-B: look for ringing after edges | Add 120Ω at each end of bus |
| Last RS-485 byte corrupted | DE switched before TX complete | Check timing of DE vs actual TX finish | Wait for TC interrupt, not TXE |
| Binary data truncated | XON/XOFF enabled, 0x13 in data | Disable XON/XOFF in UART config | Use hardware flow control or no flow control |
| Characters received but wrong | Wrong data bits, parity, stop bits | 8N1 vs 7E1, etc. Check both sides match | Match configurations exactly |
| Single byte OK, multi-byte fail | Missing level shifter → voltage marginal | Scope TX line: 3.3V sagging to 2.5V | Check wiring, power supply, add level shifter if needed |
┌────────────────────────────── UART SERIAL CHEAT SHEET ──────────────────────────────────────┐
│ │
│ FRAME FORMAT (8N1): [START=0] [D0] [D1] [D2] [D3] [D4] [D5] [D6] [D7] [STOP=1] │
│ 10 bit-times per byte, LSB first │
│ │
│ BAUD → BIT TIME THROUGHPUT (8N1) │
│ 9600 104.2 µs 960 bytes/sec │
│ 115200 8.68 µs 11,520 bytes/sec │
│ 921600 1.09 µs 92,160 bytes/sec │
│ 1000000 1.00 µs 100,000 bytes/sec │
│ │
│ BAUD TOLERANCE: ±2% max for reliable 8N1 │
│ DRIFT = bits_per_frame × (baud_error%) → must stay < 50% of 1 bit-time │
│ │
│ WIRING: TX(A) → RX(B), RX(A) → TX(B), GND(A) → GND(B) │
│ │
│ VOLTAGE LEVELS: │
│ TTL/CMOS: HIGH=VCC (3.3V or 5V), LOW=0V │
│ RS-232: HIGH=-12V, LOW=+12V (inverted, higher voltage) │
│ RS-485: Differential: A-B > +200mV = 1, A-B < -200mV = 0 │
│ │
│ DMA PATTERN: uart_rx_enable() with double-buffer + idle-line timeout │
│ USB-UART: CP2102, CH340, FTDI → virtual COM port on PC │
│ │
│ RS-485: 2-wire (A,B) + GND, 120Ω at each end, DE pin toggles TX/RX mode │
│ GOTCHA: swap TX/RX first if nothing works. Then check baud. Then check GND. │
│ │
└──────────────────────────────────────────────────────────────────────────────────────────────┘