Async Rust with Tokio Part 3: State Machines Under the Hood – What the Compiler Generates from async fn

Async Rust with Tokio Part 3: State Machines Under the Hood – What the Compiler Generates from async fn

Parts 1 and 2 established that async functions compile into state machines. This post goes all the way in – exactly what the compiler generates, why the generated structs can be large, and how understanding the transformation helps you write better async code.

The Transformation in Detail

Every async fn compiles to a struct that implements Future. The struct stores all variables that are alive across any .await point, plus a discriminant that tracks which state the function is currently in. Each .await point is a state.

// Source code:
async fn process(input: String) -> usize {
    let trimmed = input.trim().to_string(); // created before first await
    let result = fetch_length(&trimmed).await; // State 0 -> State 1
    let doubled = result * 2;                  // alive across nothing
    log_result(doubled).await;                 // State 1 -> State 2
    doubled
}

// Conceptual compiler output (simplified):
enum ProcessState {
    // Before first await: holds variables alive across it
    State0 {
        input: String,
        trimmed: String,
        fetch_future: FetchLengthFuture,
    },
    // Between awaits: holds variables alive across second await
    State1 {
        doubled: usize,
        log_future: LogResultFuture,
    },
    // Terminal state
    Done,
}

struct ProcessFuture {
    state: ProcessState,
}

impl Future for ProcessFuture {
    type Output = usize;

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll {
        loop {
            match &mut self.state {
                ProcessState::State0 { fetch_future, trimmed, .. } => {
                    match Pin::new(fetch_future).poll(cx) {
                        Poll::Ready(result) => {
                            let doubled = result * 2;
                            self.state = ProcessState::State1 {
                                doubled,
                                log_future: log_result(doubled),
                            };
                            // Loop continues to poll State1 immediately
                        }
                        Poll::Pending => return Poll::Pending,
                    }
                }
                ProcessState::State1 { log_future, doubled } => {
                    match Pin::new(log_future).poll(cx) {
                        Poll::Ready(()) => {
                            let result = *doubled;
                            self.state = ProcessState::Done;
                            return Poll::Ready(result);
                        }
                        Poll::Pending => return Poll::Pending,
                    }
                }
                ProcessState::Done => panic!("polled after completion"),
            }
        }
    }
}

The real compiler output uses unsafe code internally and is more compact, but this captures the essential structure. Every .await becomes a state, every variable alive across an await is stored in that state’s variant.

Why State Machines Can Be Large

The size of the generated future equals the size of its largest state variant. This means futures that hold large values across await points produce large structs:

async fn large_future() {
    let big_buffer = vec![0u8; 1_000_000]; // 1MB on the heap, 24 bytes on stack
    some_async_call().await;               // big_buffer alive here = stored in state
    process(&big_buffer);
}
// The state machine stores the Vec (24 bytes stack + heap pointer)
// Heap data is fine - Vec itself is just a pointer, not the data

async fn also_large() {
    let arr = [0u8; 65536]; // 64KB - this IS stored inline in the state machine
    some_async_call().await; // arr is 64KB in the state struct
    use_array(&arr);
}
// This future is ~64KB. Storing it anywhere stack-allocated is risky.

Practical implication: avoid holding large stack-allocated arrays across await points. Prefer heap-allocated types (Vec, Box) whose stack footprint is small (typically 24 bytes for a Vec) regardless of their actual data size.

Visualizing the State Machine

stateDiagram-v2
    [*] --> State0: async fn called (future created)
    State0 --> State0: poll() returns Pending\n(inner future not ready)
    State0 --> State1: poll() inner future returns Ready\n(transition, loop continues)
    State1 --> State1: poll() returns Pending\n(second inner future not ready)
    State1 --> Done: poll() inner future returns Ready\nreturn Poll::Ready(output)
    Done --> [*]

    note right of State0
        Stores: all variables live
        across first await point
    end note
    note right of State1
        Stores: all variables live
        across second await point
    end note

The Self-Referential Problem and Pin

The generated state machine can be self-referential. Consider:

async fn self_ref_example() {
    let data = String::from("hello");
    let reference = data.as_str(); // reference into data
    some_async_call().await;       // both data and reference stored in State0
    println!("{}", reference);
}

// State0 stores:
// - data: String  (the owned string)
// - reference: &str  (points INTO data's heap allocation)
// - inner_future: ...
//
// If this struct moves in memory, data moves but reference still
// points to the old location - dangling pointer!

This is exactly why Future::poll takes Pin<&mut Self>. Pinning the future to a memory location guarantees it will not move after the first poll. The runtime pins futures before calling poll for the first time and never moves them afterward.

Examining Real State Machine Sizes

use std::mem::size_of_val;

async fn small_future() -> u32 {
    tokio::time::sleep(std::time::Duration::from_millis(1)).await;
    42
}

async fn larger_future() -> Vec {
    let data = vec![0u8; 100]; // heap-allocated, 24 bytes on stack
    tokio::time::sleep(std::time::Duration::from_millis(1)).await;
    data
}

async fn measure() {
    let f1 = small_future();
    let f2 = larger_future();

    println!("small_future size: {} bytes", size_of_val(&f1));
    println!("larger_future size: {} bytes", size_of_val(&f2));

    // Run them to avoid warnings
    let _ = tokio::join!(f1, f2);
}
// small_future: typically ~32-64 bytes (timer state + output slot)
// larger_future: slightly more (adds Vec's 24 bytes to carry across await)

The Loop Behavior in Poll

Notice the generated poll implementation uses a loop. This is an important optimization. When an inner future transitions from Pending to Ready and triggers a state change, the outer future immediately tries to poll the next state without returning Pending first. This avoids an unnecessary round-trip through the scheduler.

// Without the loop (naive implementation):
// State0 ready -> return Pending -> scheduler wakes task -> poll -> State1 starts
// Extra round-trip through the scheduler

// With the loop (actual implementation):
// State0 ready -> immediately poll State1 in the same poll() call
// State1 pending -> return Pending
// No unnecessary scheduler round-trip

This loop runs until either a state returns Pending (no progress possible) or the future completes. It is the source of one important pitfall: if a future’s states never return Pending – for example, a tight computation loop – the poll call runs forever, blocking the thread. We cover this in Part 8 on common pitfalls.

Nested Futures and Composition

Futures compose recursively. When you .await an inner async function, its state machine is nested inside the outer state machine’s state variant:

async fn inner() -> u32 {
    tokio::time::sleep(std::time::Duration::from_millis(10)).await;
    42
}

async fn outer() -> u32 {
    let x = inner().await; // InnerFuture stored in OuterFuture's State0
    x + 1
}

// OuterFuture::State0 contains:
//   inner_future: InnerFuture  (which itself contains timer state)
//
// Polling outer polls inner, which polls the timer
// The entire chain executes in a single thread with no allocation

This recursive nesting is what makes async Rust zero-cost. There is no heap allocation per await point, no dynamic dispatch, no garbage collection. The entire call stack is a single struct on whatever memory the outermost future lives in.

Using async-std-inspect or cargo-expand to See Generated Code

// Install cargo-expand:
// cargo install cargo-expand

// Then run:
// cargo expand --bin myapp 2>/dev/null | grep -A 50 "async fn my_function"

// For a simpler view, use std::mem::size_of to check state machine sizes:
fn check_sizes() {
    println!("Future sizes:");
    println!("  small: {}", std::mem::size_of::>());
    // Note: you need concrete types for size_of, use size_of_val on instances
}

// Practical tip: if a future is unexpectedly large,
// box it to move it to the heap:
async fn large_caller() {
    // Instead of:
    // let f = large_async_fn().await;

    // Use Box::pin to heap-allocate the state machine:
    let f = Box::pin(large_async_fn()).await;
}

async fn large_async_fn() -> Vec { vec![] }

What Comes Next

With the state machine model clear, Part 4 moves to tasks – how to spawn them, manage their lifetimes, handle their results, and cancel them. Understanding state machines makes task cancellation behavior much less surprising.

References

Written by:

646 Posts

View All Posts
Follow Me :