Idempotency on the server is only half the story. The other half is the client knowing when and how to retry safely. A client that retries too aggressively can turn a recovering service into a failed one. A client that never retries wastes the safety guarantees you just built.
This part covers the mechanics of retry logic: exponential backoff, jitter, retry budgets, and which errors are safe to retry at all.
Which Errors Are Safe to Retry
Not every error means “try again.” Retrying the wrong errors wastes resources or makes things worse.
| Condition | Retry? | Why |
|---|---|---|
| Network timeout | Yes | Request may not have reached server |
| Connection refused | Yes | Server temporarily unavailable |
| 5xx Server Error | Yes (with limits) | Server-side transient failure |
| 429 Too Many Requests | Yes (with Retry-After) | Rate limited, back off first |
| 408 Request Timeout | Yes | Server did not finish in time |
| 400 Bad Request | No | Client error, retrying won’t help |
| 401 Unauthorized | No | Fix auth first |
| 404 Not Found | No | Resource does not exist |
| 422 Unprocessable | No | Logic error, not transient |
Exponential Backoff
The simplest retry strategy — retrying immediately after failure — is also the most dangerous at scale. If 10,000 clients all fail simultaneously and all retry at once, they generate a thundering herd that can push a recovering server back into failure.
Exponential backoff spaces retries out by doubling the wait time between each attempt. The delay after attempt n is roughly: base_delay * 2^n.
gantt
title Retry Timeline (base_delay = 1s, max = 30s)
dateFormat X
axisFormat %Ls
section Attempts
Attempt 1 (fail) :0, 1
Wait 1s :1, 2
Attempt 2 (fail) :2, 3
Wait 2s :3, 5
Attempt 3 (fail) :5, 6
Wait 4s :6, 10
Attempt 4 (fail) :10, 11
Wait 8s :11, 19
Attempt 5 (success) :19, 20Adding Jitter
Backoff alone is not enough if all clients started at the same time. They will still all retry at the same intervals in sync — the thundering herd just moves to later. Jitter adds randomness to the delay so clients desynchronize.
AWS recommends “full jitter”: pick a random value between zero and the calculated backoff. This spreads retries uniformly across the window and significantly reduces load spikes on recovering services.
use std::time::Duration;
use rand::Rng;
pub struct RetryConfig {
pub max_attempts: u32,
pub base_delay_ms: u64,
pub max_delay_ms: u64,
}
impl RetryConfig {
pub fn delay_for_attempt(&self, attempt: u32) -> Duration {
// Exponential backoff: base * 2^attempt
let exponential = self.base_delay_ms * 2u64.pow(attempt);
// Cap at max delay
let capped = exponential.min(self.max_delay_ms);
// Full jitter: random between 0 and capped
let jittered = rand::thread_rng().gen_range(0..=capped);
Duration::from_millis(jittered)
}
}
// Usage
let config = RetryConfig {
max_attempts: 5,
base_delay_ms: 200,
max_delay_ms: 30_000, // 30 seconds cap
};A Full Retry Client in Rust
Here is a complete retry wrapper using reqwest that handles idempotency key persistence across attempts and applies full jitter backoff.
use reqwest::{Client, Response, StatusCode};
use serde::Serialize;
use std::time::Duration;
use tokio::time::sleep;
use uuid::Uuid;
pub struct IdempotentClient {
inner: Client,
config: RetryConfig,
}
impl IdempotentClient {
pub fn new(config: RetryConfig) -> Self {
Self {
inner: Client::new(),
config,
}
}
pub async fn post_with_retry<T: Serialize>(
&self,
url: &str,
body: &T,
) -> anyhow::Result<Response> {
// Generate key once -- reused across all retries
let idempotency_key = Uuid::new_v4().to_string();
let body_bytes = serde_json::to_vec(body)?;
let mut last_error = None;
for attempt in 0..self.config.max_attempts {
let result = self
.inner
.post(url)
.header("Idempotency-Key", &idempotency_key)
.header("Content-Type", "application/json")
.body(body_bytes.clone())
.timeout(Duration::from_secs(30))
.send()
.await;
match result {
Ok(resp) => {
let status = resp.status();
// Success
if status.is_success() {
return Ok(resp);
}
// Do not retry client errors (4xx), except 408 and 429
if status.is_client_error()
&& status != StatusCode::REQUEST_TIMEOUT
&& status != StatusCode::TOO_MANY_REQUESTS
{
return Ok(resp); // Return to caller to handle
}
// For 429, respect Retry-After header if present
if status == StatusCode::TOO_MANY_REQUESTS {
if let Some(retry_after) = resp.headers().get("Retry-After") {
if let Ok(secs) = retry_after.to_str().unwrap_or("0").parse::<u64>() {
sleep(Duration::from_secs(secs)).await;
continue;
}
}
}
last_error = Some(anyhow::anyhow!("Server error: {}", status));
}
Err(e) if e.is_timeout() || e.is_connect() => {
last_error = Some(anyhow::anyhow!("Network error: {}", e));
}
Err(e) => {
return Err(anyhow::anyhow!("Non-retryable error: {}", e));
}
}
// Not the last attempt -- wait before retrying
if attempt + 1 < self.config.max_attempts {
let delay = self.config.delay_for_attempt(attempt);
tracing::warn!(
attempt = attempt + 1,
delay_ms = delay.as_millis(),
key = %idempotency_key,
"Retrying request"
);
sleep(delay).await;
}
}
Err(last_error.unwrap_or_else(|| anyhow::anyhow!("Max retries exceeded")))
}
}Retry Budgets
Exponential backoff caps how long any single retry sequence runs. But in a microservice with many instances, each making its own retries, the aggregate retry load can still overwhelm a downstream service. A retry budget limits the fraction of total requests that can be retries at any given time.
The idea: track how many of your last N requests were retries. If retries exceed a threshold (say 10%), stop retrying and fail fast. This prevents a cascade where one struggling service causes all its callers to saturate it further with retry traffic.
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::Arc;
pub struct RetryBudget {
total_requests: Arc<AtomicU32>,
retry_requests: Arc<AtomicU32>,
max_retry_fraction: f64, // e.g. 0.10 for 10%
}
impl RetryBudget {
pub fn can_retry(&self) -> bool {
let total = self.total_requests.load(Ordering::Relaxed) as f64;
let retries = self.retry_requests.load(Ordering::Relaxed) as f64;
if total == 0.0 {
return true;
}
(retries / total) < self.max_retry_fraction
}
pub fn record_attempt(&self, is_retry: bool) {
self.total_requests.fetch_add(1, Ordering::Relaxed);
if is_retry {
self.retry_requests.fetch_add(1, Ordering::Relaxed);
}
}
}Non-Idempotent Operations: Do Not Retry
There is one hard rule: never retry a request without an idempotency key if the operation has side effects. Without a key, the server cannot deduplicate — and retrying is identical to sending two separate requests. The client code we built above always attaches an idempotency key for POST requests, which is exactly right. Remove the key and the retry logic becomes dangerous.
Summary
Retries are necessary for reliability. Naive retries cause thundering herds and duplicate side effects. Exponential backoff with full jitter spreads retry load. Retry budgets prevent cascading amplification. And none of this is safe without idempotency keys on the server side. In Part 5, we move to a different delivery mechanism entirely — message queues — where the retry semantics are controlled by the broker rather than the client.
