Advanced Rust Series Part 8: The Self-Referential Struct Problem – Pin, Unpin, and Async State Machines

Advanced Rust Series Part 8: The Self-Referential Struct Problem – Pin, Unpin, and Async State Machines

At the end of Part 5, we noted that self-referential structs are not possible with ordinary lifetime annotations. A struct whose fields point into other fields of the same struct would have dangling pointers the moment it moves in memory. This is the problem Pin was designed to solve – and it is also the reason async Rust works the way it does.

The Problem: Moving and Self-Referential Structs

In Rust, values can move. When you assign, pass to a function, or push into a Vec, the bytes of the value are copied to a new memory location. For most types this is fine. But consider a struct that holds a pointer into its own data:

// Conceptual illustration of the problem - this cannot be expressed safely
struct SelfRef {
    data: String,
    // This pointer would become dangling if SelfRef moves
    // because data's address changes when the struct moves
    ptr: *const String, // points to self.data
}

fn main() {
    let s = SelfRef { data: String::from("hello"), ptr: /* ??? */ };
    let s2 = s; // s moves to s2 - data is at a new address, ptr is now dangling
}

This is the core issue. Async functions compiled into state machines by the Rust compiler often produce exactly this pattern – a state machine that holds both data and references to that data, all in one struct. Without a way to prevent moves, these state machines would be unsound.

What Pin Does

Pin<P> is a wrapper around a pointer type P (like &mut T or Box<T>) that guarantees the pointed-to value will not move. Once a value is pinned, the compiler prevents any operation that would move it:

use std::pin::Pin;

fn pin_in_box() {
    let mut val = Box::pin(5i32); // val is pinned on the heap
    
    // You can get a pinned mutable reference
    let pinned: Pin<&mut i32> = val.as_mut();
    
    // You can modify through a pinned reference
    // but you cannot move the value out
}

// What Pin prevents:
fn would_move(mut pinned: Pin<&mut T>) {
    // let moved = *pinned; // ERROR: cannot move out of Pin<&mut T>
    // std::mem::replace(&mut *pinned, value); // ERROR: same reason
}

Unpin: Opting Out of Pinning Guarantees

Most types in Rust are safe to move – they do not hold self-references. For these types, pinning is unnecessary overhead in reasoning. The Unpin trait marks these types: if T: Unpin, then Pin<&mut T> provides no additional guarantees and you can move the value freely.

use std::pin::Pin;

fn demonstrate_unpin() {
    let mut x = 5i32; // i32 is Unpin
    let mut pinned = Pin::new(&mut x); // Pin<&mut i32>
    
    // Because i32: Unpin, we can get a regular &mut i32 back
    let regular: &mut i32 = Pin::get_mut(pinned);
    *regular = 10;
    
    println!("{}", x); // 10
}

Almost all types implement Unpin automatically. The notable exceptions are types generated by the compiler for async functions – the opaque Future state machines. These do not implement Unpin because they may contain self-references.

flowchart TD
    A[T: Unpin] -->|Safe to move, Pin adds no restriction| B["Pin<&mut T> behaves like &mut T"]
    C[T: !Unpin] -->|May be self-referential, Pin prevents moves| D["Pin<&mut T> cannot be moved or replaced"]
    E[Most types: i32, String, Vec, structs] --> A
    F[async fn state machines, custom self-ref types] --> C

    style B fill:#27ae60,color:#fff
    style D fill:#c0392b,color:#fff

Pin in Async Rust

Every async fn in Rust compiles down to a struct that implements the Future trait. This struct contains all local variables that exist across await points – because execution can suspend and resume, those variables need to live somewhere between suspension and resumption.

// This async function:
async fn example() {
    let data = String::from("hello");
    let reference = &data; // reference into data
    tokio::time::sleep(std::time::Duration::from_millis(10)).await;
    println!("{}", reference); // reference still valid after await
}

// Compiles to something conceptually like:
enum ExampleStateMachine {
    Start,
    Suspended {
        data: String,
        reference: *const String, // points into data above
    },
    Done,
}
// This is why Future state machines cannot be moved after first poll
// Moving would invalidate the internal pointer

The Future trait’s poll method takes Pin<&mut Self> precisely because of this. By requiring the future to be pinned before polling, the runtime guarantees that the future’s memory address will not change between polls.

use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};

// The Future trait - poll takes Pin<&mut Self>
pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll;
}

// A simple manual Future implementation
struct MyFuture {
    value: i32,
    completed: bool,
}

impl Future for MyFuture {
    type Output = i32;

    fn poll(mut self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll {
        if self.completed {
            Poll::Ready(self.value)
        } else {
            self.completed = true;
            Poll::Pending // will be polled again
        }
    }
}

Pinning to the Stack vs Heap

You can pin values to the heap using Box::pin, or to the stack using the pin! macro (stable since Rust 1.68) or unsafe code:

use std::pin::{Pin, pin};

// Heap pinning - allocates
fn heap_pin() {
    let future = Box::pin(async { 42 });
    // future: Pin>>
}

// Stack pinning - no allocation (Rust 1.68+)
fn stack_pin() {
    let future = pin!(async { 42 });
    // future: Pin<&mut impl Future>
    // the future lives on this stack frame and cannot outlive it
}

// The pin! macro is equivalent to this unsafe pattern:
fn manual_stack_pin() {
    let mut future = async { 42 };
    // Safety: we never move future after this point
    let pinned = unsafe { Pin::new_unchecked(&mut future) };
}

Heap pinning is simpler and the common choice. Stack pinning avoids allocation but requires the pinned value to stay in scope, which can be limiting for futures you want to pass around.

Writing a Safe Self-Referential Struct

If you need a self-referential struct in production code, the safe pattern is to use Pin with an initialization function that constructs the self-reference only after pinning:

use std::pin::Pin;
use std::marker::PhantomPinned;

struct SelfReferential {
    data: String,
    ptr: *const String,
    _pin: PhantomPinned, // marks this type as !Unpin
}

impl SelfReferential {
    fn new(data: String) -> Pin> {
        let mut boxed = Box::pin(SelfReferential {
            data,
            ptr: std::ptr::null(), // not yet initialized
            _pin: PhantomPinned,
        });

        // Now that it is pinned, set the self-reference
        // Safety: we only set ptr after pinning, and never move the value
        let ptr = &boxed.data as *const String;
        unsafe {
            let mut_ref = Pin::as_mut(&mut boxed);
            Pin::get_unchecked_mut(mut_ref).ptr = ptr;
        }

        boxed
    }

    fn data(self: Pin<&Self>) -> &str {
        &self.data
    }

    fn via_ptr(self: Pin<&Self>) -> &str {
        // Safety: ptr is valid as long as self is pinned
        unsafe { &*self.ptr }
    }
}

fn main() {
    let s = SelfReferential::new(String::from("hello, pinned world"));
    println!("{}", s.as_ref().data());    // "hello, pinned world"
    println!("{}", s.as_ref().via_ptr()); // "hello, pinned world"
}

PhantomPinned is a zero-size marker type that opts the struct out of Unpin. Without it, the compiler would assume the struct is safe to move, defeating the purpose of pinning.

Practical Guidelines

  • If you are writing application code with async/await, you rarely need to think about Pin directly. The compiler and runtime handle it.
  • If you are writing an async executor or low-level async library, you will use Pin<&mut dyn Future> frequently.
  • If you need self-referential structs, consider whether an index-based design or a separate ownership structure can avoid the need entirely. Self-referential structs with Pin are correct but complex.
  • The pin-project crate provides safe projection through pinned structs and is the standard tool for writing custom Future implementations without unsafe code.

What Comes Next

Part 9 shifts from these theoretical foundations to production patterns: the lifetime mistakes that appear in real codebases, how to diagnose borrow checker conflicts that have no obvious fix, and the refactoring strategies that resolve them cleanly.

References

Written by:

640 Posts

View All Posts
Follow Me :
How to whitelist website on AdBlocker?

How to whitelist website on AdBlocker?

  1. 1 Click on the AdBlock Plus icon on the top right corner of your browser
  2. 2 Click on "Enabled on this site" from the AdBlock Plus option
  3. 3 Refresh the page and start browsing the site