A coroutine is a function that can suspend its execution and later resume from where it left off. Unlike regular functions which run to completion once called, coroutines can pause mid-execution, return control to the caller, and pick up exactly where they stopped.
C++20 coroutines are stackless — the coroutine frame (local variables, suspension point) is allocated on the heap (or elided by the compiler), not on the call stack. This makes them extremely lightweight compared to threads or fibers.
Any function containing co_await, co_yield, or co_return is a coroutine. The compiler transforms it into a state machine:
Original coroutine:
Task<int> compute() {
int a = co_await fetch_data();
int b = co_await process(a);
co_return a + b;
}
Compiler generates (conceptually):
- Allocates a coroutine frame on the heap
- Stores local variables (a, b) in the frame
- Creates a state machine with suspension points
- Each co_await becomes a state transition
The coroutine frame contains:
- The promise object
- Parameters (copies)
- Local variables
- Suspension point index (which co_await are we at?)
- Temporaries spanning suspension points
co_await exprSuspends the coroutine until the awaited expression is ready. The expression must be an Awaitable.
auto result = co_await some_async_operation();
// Execution resumes here when the operation completes
co_yield exprYields a value to the caller and suspends. Shorthand for:
co_await promise.yield_value(expr);
Used primarily in generators — coroutines that produce a sequence of values lazily.
co_return expr / co_returnCompletes the coroutine, optionally returning a value. Calls promise.return_value(expr) or promise.return_void().
Every coroutine has an associated promise type that controls its behavior. The compiler finds it via:
// For a coroutine returning ReturnType:
using promise_type = typename std::coroutine_traits<ReturnType, Args...>::promise_type;
// Usually defined as ReturnType::promise_type
The promise type must provide:
| Method | Purpose |
|---|---|
get_return_object() |
Creates the return object (e.g., Generator<T>, Task<T>) |
initial_suspend() |
Return awaitable: suspend at start? (suspend_always = lazy, suspend_never = eager) |
final_suspend() noexcept |
Return awaitable: suspend at end? (usually suspend_always so caller can inspect result) |
unhandled_exception() |
Called if coroutine throws — typically stores std::current_exception() |
return_value(T) / return_void() |
Called on co_return |
yield_value(T) |
Called on co_yield (optional, for generators) |
std::coroutine_handle<Promise> is a non-owning pointer to the coroutine frame:
std::coroutine_handle<promise_type> handle;
handle.resume(); // Resume execution
handle.done(); // Check if coroutine has finished
handle.destroy(); // Destroy the coroutine frame (free memory)
handle.promise(); // Access the promise object
Critical: Someone must call handle.destroy() or the coroutine frame leaks. Typically the return object (Generator, Task) owns the handle and destroys it in its destructor.
An Awaitable is anything you can co_await. The compiler converts it to an Awaiter with three methods:
struct MyAwaiter {
bool await_ready() const noexcept;
// Return true if result is already available (skip suspension)
void await_suspend(std::coroutine_handle<> h) noexcept;
// Called when the coroutine suspends.
// Can also return bool (false = don't actually suspend)
// Or return coroutine_handle<> for symmetric transfer
T await_resume() noexcept;
// Called when the coroutine resumes. Returns the co_await result.
};
The standard provides two built-in awaiters:
- std::suspend_always — await_ready() returns false (always suspends)
- std::suspend_never — await_ready() returns true (never suspends)
Generators produce values on-demand using co_yield:
Generator<int> fibonacci() {
int a = 0, b = 1;
while (true) {
co_yield a;
auto next = a + b;
a = b;
b = next;
}
}
// Consumer pulls values lazily:
for (int val : fibonacci()) {
if (val > 1000) break;
std::cout << val << "\n";
}
Key properties:
- Lazy: values computed only when requested
- Infinite sequences: the generator can run forever; consumer controls termination
- Memory efficient: only one value in flight at a time
- initial_suspend returns suspend_always so the generator doesn’t run until first value is requested
A Task<T> represents a computation that will eventually produce a single value:
Task<std::string> fetch_url(std::string url) {
auto response = co_await http_get(url);
co_return response.body;
}
Task<int> compute() {
auto data = co_await fetch_url("https://example.com");
co_return data.size();
}
Key properties:
- Lazy start: initial_suspend returns suspend_always
- Chainable: co_await-ing one Task from another suspends the caller
- Single value: produces exactly one result (unlike generators)
When coroutine A awaits coroutine B, and B completes, B needs to resume A. Without symmetric transfer, this creates a chain of resume() calls on the stack:
main() → A.resume() → B.resume() → C.resume() → ...
With deep chains, this overflows the stack.
await_suspend can return a coroutine_handle<> instead of void:
std::coroutine_handle<> await_suspend(std::coroutine_handle<> caller) noexcept {
// Instead of: caller stored, then return to resume loop
// Return the handle to resume next — compiler does tail-call
return next_coroutine_to_resume;
}
The compiler generates a tail-call: instead of stacking resume() calls, it jumps directly to the next coroutine. Stack depth stays O(1).
The compiler can elide the heap allocation for the coroutine frame if: - The coroutine’s lifetime is fully enclosed by the caller - The compiler can prove the frame size at compile time - Optimization is enabled
This is similar to copy elision — the standard permits it but doesn’t require it.
Override operator new/operator delete in the promise type:
struct promise_type {
void* operator new(std::size_t size) {
return my_pool_allocator::allocate(size);
}
void operator delete(void* ptr) {
my_pool_allocator::deallocate(ptr);
}
};
Useful for real-time systems where heap allocation is forbidden after initialization.
Replace callback-based async I/O with linear code:
Task<Buffer> read_file(std::string path) {
auto fd = co_await async_open(path);
auto data = co_await async_read(fd);
co_await async_close(fd);
co_return data;
}
Chain transformations without materializing intermediate collections:
Generator<int> filter_even(Generator<int> source) {
for (int val : source) {
if (val % 2 == 0) co_yield val;
}
}
Express complex state machines as sequential code with suspension points:
Task<void> connection_handler(Socket sock) {
auto handshake = co_await read_handshake(sock);
if (!validate(handshake)) co_return;
while (true) {
auto request = co_await read_request(sock);
if (request.is_close()) break;
auto response = process(request);
co_await write_response(sock, response);
}
}
A scheduler that round-robins between coroutines without threads:
// Each coroutine yields back to the scheduler
scheduler.spawn(task_a());
scheduler.spawn(task_b());
scheduler.run(); // Runs all tasks cooperatively
| Aspect | Coroutines | Threads |
|---|---|---|
| Scheduling | Cooperative (explicit co_await) |
Preemptive (OS scheduler) |
| Context switch | ~nanoseconds (jump + restore registers) | ~microseconds (kernel transition) |
| Memory | ~200 bytes per coroutine frame | ~1-8 MB stack per thread |
| Concurrency | Single-threaded by default | True parallelism |
| Data races | No (single thread) | Yes (shared mutable state) |
| Scalability | Millions of coroutines | Thousands of threads |
| Blocking | One block stops everything | Only blocks that thread |
Coroutines excel for I/O-bound workloads with many concurrent operations. Threads excel for CPU-bound parallelism.
ROS2’s executor model dispatches callbacks (subscription, timer, service) in an event loop. Complex workflows require chaining callbacks, leading to “callback hell”:
// Current ROS2 callback chain:
void on_scan(LaserScan::SharedPtr msg) {
auto result = process(msg);
// Publish triggers another callback in a different node...
publisher_->publish(result);
}
void on_result(Result::SharedPtr msg) {
// Continue processing...
}
With coroutines, this could be linearized:
// Hypothetical coroutine-based ROS2 workflow:
Task<void> scan_pipeline(Node& node) {
while (rclcpp::ok()) {
auto scan = co_await node.next_message<LaserScan>("/scan");
auto result = process(scan);
node.publish("/result", result);
auto ack = co_await node.next_message<Ack>("/ack");
// All sequential, no callback spaghetti
}
}
This is the direction ROS2 is heading — rclcpp may adopt coroutine-friendly executors in future releases.
The #1 coroutine bug. Parameters and references can become dangling after suspension:
Task<void> bad(const std::string& s) {
co_await something();
// s may be dangling! The caller's string could be destroyed.
std::cout << s; // UNDEFINED BEHAVIOR
}
void caller() {
bad(std::string("temporary")); // temporary destroyed before resume
}
Fix: Take parameters by value, or ensure the referenced object outlives the coroutine.
Task<void> process() {
expensive_async_operation(); // Oops — returned Task is discarded!
// Use: co_await expensive_async_operation();
}
The Task object is created and immediately destroyed. The operation never runs. Use [[nodiscard]] on Task to get compiler warnings.
The coroutine frame lives until destroy() is called. If nobody destroys it, it leaks:
void leak() {
auto gen = fibonacci(); // Coroutine frame allocated
// gen goes out of scope — destructor must call handle.destroy()
// If Generator lacks a proper destructor, the frame leaks.
}
final_suspend() must be noexcept. If it throws, the program terminates.
Calling handle.destroy() on a coroutine that is currently executing (not suspended) is undefined behavior.
| Compiler | Version | Flag | Notes |
|---|---|---|---|
| GCC | 10+ | -std=c++20 -fcoroutines |
-fcoroutines flag required even with C++20 mode |
| GCC | 11+ | -std=c++20 -fcoroutines |
Better optimization, fewer bugs |
| GCC | 13+ | -std=c++20 |
-fcoroutines no longer needed |
| Clang | 14+ | -std=c++20 |
Full support, no extra flags needed |
| Clang | 16+ | -std=c++20 |
Best optimization, symmetric transfer support |
| MSVC | 19.28+ | /std:c++20 |
Full support |
This study module: requires GCC 10+ with -fcoroutines, or Clang 14+.
GCC 9 (Ubuntu 20.04 default) does NOT support coroutines at all. Install GCC 10+:
sudo apt install g++-10
# Then compile with:
g++-10 -std=c++20 -fcoroutines -o ex01 ex01_generator.cpp
Or install a newer Clang:
sudo apt install clang-14
clang++-14 -std=c++20 -o ex01 ex01_generator.cpp