Reading the Supervision Tree Contract, Key by Key

A supervision tree is a precise contract. It's a list of child specs and a handful of flags that tell the runtime exactly who restarts whom, when, and how hard to try before giving up. Most people treat it as boilerplate copied from the last project; they use Supervisor, paste a children list, pick :one_for_one because it was already there, and never touch a single child spec key. That works fine right up until the day a crash loop takes down a subtree at 3am and the restart strategy turns out to be the actual bug. The contract was sitting there in the spec the whole time. Nobody read it.

I've spent enough time staring at supervisor logs to believe the fix is usually that the tree was describing a system nobody had actually thought through. This post assumes you already buy "let it crash" — I've written about that philosophy separately — and you just want the exact APIs and the defaults that bite.

A child spec is a map, and you should know the shape

Every child under a supervisor is described by a child specification map. You almost never hand-write this map — child_spec/1 generates it for you — but knowing the shape is what makes your overrides readable later, when you actually need to change something.

%{
  id: MyApp.Worker,
  start: {MyApp.Worker, :start_link, [arg]},  # MFA tuple
  restart: :permanent,   # default
  shutdown: 5_000,       # default for :worker; :infinity for :supervisor
  type: :worker          # default
}

Five keys, and only two of them are required.

:id is any term the supervisor uses internally to identify this spec. It defaults to the module name. :start is the load-bearing one: a module/function/arguments tuple the supervisor invokes to spawn the child. It's a real function call the supervisor invokes — MyApp.Worker.start_link(arg). Everything else is defaulted.

Those defaults are where people get surprised. :restart defaults to :permanent, :type to :worker, and :shutdown to 5_000 milliseconds for a worker. The shutdown default is the one that bites, because it's different for supervisors; a child that is itself a supervisor defaults to :infinity, not 5 seconds.

You rarely see this map in real code because use GenServer and use Supervisor generate child_spec/1 automatically. That's why your children list is full of bare modules and {Module, arg} tuples instead of these five-key maps:

children = [
  MyApp.Repo,
  {MyApp.Cache, ttl: 60},
  MyApp.Worker
]

Supervisor.init(children, strategy: :one_for_one)

Each of those entries gets expanded into the full map above by the corresponding module's generated child_spec/1. Convenient — until you try to start two of the same module. Both auto-generated specs share the same :id, the module name, and the supervisor refuses to start with a duplicate-id error. That's a real bite, and the fix is the next section.supervisor

Running two of the same worker: `Supervisor.child_spec/2`

Supervisor.child_spec/2 builds a spec from a module or {module, arg} and applies overrides on top. It's the idiomatic way to give each instance of the same module a distinct :id without dropping down to a hand-written map.

children = [
  Supervisor.child_spec({MyWorker, :a}, id: :worker_a),
  Supervisor.child_spec({MyWorker, :b}, id: :worker_b)
]

Supervisor.init(children, strategy: :one_for_one)

The first argument is exactly what you'd normally drop into the children list. The second is a keyword list of overrides merged onto the generated spec. Two distinct :ids, one supervisor, two live processes — no collision.

The same call tunes any spec key, not just :id:

Supervisor.child_spec(MyWorker, id: {MyWorker, 1}, restart: :transient)

So this is the general-purpose knob for reaching into an auto-generated spec and changing one thing. One caveat worth holding onto: the override only changes the spec, never the process's own logic. If both MyWorker instances call start_link with the same registered name: internally, they still collide the moment the second one boots — distinct :id fixes the supervisor's bookkeeping, not your process's name registration.

The three strategies are three blast radii

The strategy decides who gets restarted when one child dies. There are three, and the difference between them is entirely about how far a single crash spreads.

:one_for_one restarts only the dead child and leaves every sibling running. :one_for_all terminates and restarts every child, including the ones that were perfectly healthy. :rest_for_one restarts the dead child plus every child started after it in the list, while everything started before it is left untouched.

That last one is the interesting one, because the list order is doing real work:

# rest_for_one: if Repo crashes, Cache restarts too (it started after Repo),
# but Metrics (started before) is left alone.
children = [MyApp.Metrics, MyApp.Repo, MyApp.Cache]

Supervisor.init(children, strategy: :rest_for_one)

Walk a Repo crash through each strategy against that exact list. Under :one_for_one, the crash touches nothing but Repo. Under :one_for_all, it bounces Metrics, Repo, and Cache — all three. Under :rest_for_one, it bounces Repo and Cache, and Metrics keeps humming.

The detail people overlook: with :rest_for_one, the order of the list encodes your dependency graph. Children that depend on earlier children have to be listed after them, because that ordering is the only thing telling the supervisor which dependents to take down. The canonical case is a connection pool that a cache and some workers depend on. If the pool dies, its dependents are now holding references to stale state and should restart too; anything that started before the pool never depended on it and can be left alone.

:one_for_all is tempting as a "just restart everything to be safe" default. It widens the blast radius of every single crash, and it can turn one flaky child into a whole-subtree bounce. Reach for :rest_for_one when there's a genuine ordering dependency, and :one_for_one the rest of the time.supervisor

Restart type decides whether a child comes back at all

The :restart value decides whether a given child comes back at all, regardless of which strategy is in play. Three options, and they answer a different question than the strategy does.

# A run-once job that should retry on crash but stay down on clean completion:
Supervisor.child_spec({MyApp.Job, args}, restart: :transient)

# exit(:normal) / exit(:shutdown) / exit({:shutdown, term}) => stays down
# any other exit reason                                     => restarted

:permanent always restarts the child no matter how it died, and it's the default. :temporary never restarts it, regardless of strategy or exit reason. :transient restarts only on an abnormal exit.

That word "abnormal" is precise, and it's where :transient trips people. An abnormal exit is any reason other than :normal, :shutdown, or {:shutdown, term}. So a job that runs to completion and returns normally is a clean exit; under :transient it will not be restarted. A job that raises is an abnormal exit, and it will. That's exactly the behavior you want from a retryable one-shot task — crashes get retried and a clean exit stays down.

:permanent is right for long-lived services that should always be up: your repo, your endpoint, your background pollers. :temporary is for fire-and-forget children whose death is uninteresting; note that a :temporary child's exit also never triggers sibling restarts under :one_for_all or :rest_for_one, so it's genuinely inert as far as its supervisor is concerned.

The two failure modes here are mirror images. People put :transient on a worker and are baffled when it never comes back after a clean exit. Then they put :permanent on a one-shot job and create a crash loop, because the job finishes, exits normally, and the supervisor faithfully starts it again — forever. Pick the restart type from the process's lifecycle.restart-values

Restart intensity is the circuit breaker

A supervisor caps how hard it will try. If more than max_restarts restarts happen within max_seconds, it gives up: it terminates all of its children and exits itself with reason :shutdown, escalating the failure to its own supervisor. In Elixir the defaults are three restarts in five seconds.

@impl true
def init(_init_arg) do
  children = [MyApp.Worker]

  Supervisor.init(children,
    strategy: :one_for_one,
    max_restarts: 10,
    max_seconds: 60
  )
  # more than 10 restarts in 60s => this supervisor exits with :shutdown
end

This stops a process from crash-looping forever. The supervisor doesn't retry infinitely. It has a budget, and when the budget blows, the failure propagates up.

Follow the escalation precisely, because this is where the tree actually earns its name. The supervisor exits with :shutdown. Its parent sees that as a child death and handles it according to the parent's own strategy and intensity. If the parent also blows its budget, the failure climbs another level. That's how a localized crash storm becomes a larger restart, and eventually, if it keeps climbing, a node-level decision. Each level of the tree is a chance to either absorb the failure or pass it upward.

One naming note for when you read OTP source or write a supervisor in Erlang directly: the same two knobs are called intensity and period there, and the raw Erlang defaults are different.

%% Erlang supervisor flags map — same mechanism, different names + default:
SupFlags = #{strategy => one_for_one,
             intensity => 1,   %% MaxR  (Elixir max_restarts default is 3)
             period => 5}.     %% MaxT seconds

intensity is max_restarts and defaults to 1 at the Erlang layer; period is max_seconds and defaults to 5. Elixir's Supervisor.init/2 bumps the restart default up to 3.

The pitfall is treating the circuit breaker as an annoyance to tune away. Set max_restarts: 1000 and you've defeated it — you've converted a fail-fast escalation into an invisible crash loop that burns CPU and floods your logs while never surfacing the real problem. When a child keeps hitting its restart limit, the fix is almost always upstream of the supervisor, not a bigger budget.supervisor erlang-sup

Shutdown: graceful drain vs. brutal kill

The :shutdown value controls how a child dies when the supervisor takes it down — during a deploy, a :one_for_all restart, or the supervisor's own exit. The supervisor sends Process.exit(child, :shutdown). What happens next depends entirely on whether the child traps exits.

# Worker that flushes a buffer on shutdown: trap exits, then give it room.
def init(_) do
  Process.flag(:trap_exit, true)
  {:ok, %{}}
end

# Give it 10s to drain before a hard kill:
Supervisor.child_spec({Flusher, []}, shutdown: 10_000)

# Or kill immediately, no cleanup:
Supervisor.child_spec({Cache, []}, shutdown: :brutal_kill)

There are three forms the value can take. An integer is a grace period in milliseconds. :brutal_kill sends an immediate Process.exit(child, :kill) with no grace at all. :infinity waits indefinitely.

Tie that back to the defaults from the first section: 5_000 for workers, :infinity for supervisors. The supervisor default is :infinity on purpose — a child supervisor needs to drain its own workers, in order, before the parent reaps it, and you can't put a fixed timeout on "however long the subtree takes." Cutting that off mid-drain would defeat the point of having the subtree.

The grace period only does anything if the child both traps exits and implements terminate/2. A worker that doesn't trap exits dies on the first :shutdown signal and ignores the timeout completely. The 10 seconds in the example above only buys Flusher time because it called Process.flag(:trap_exit, true) in init/1; without that line, the timeout is ignored.

A too-long finite worker timeout stalls the entire shutdown. During a deploy, the supervisor blocks waiting for one slow child to drain, and everything sequenced behind it waits too. Reserve long or :infinity shutdowns for children that genuinely must finish — a buffer flush, an in-flight transaction — and use :brutal_kill for stateless children where running terminate/2 would accomplish nothing anyway.shutdown-values erlang-sup

When the child list isn't fixed: `DynamicSupervisor`

A plain Supervisor starts a fixed list of children from init/1. That covers most of a real application's tree, but it doesn't cover the case where you don't know the children until runtime — one worker per connection, per job, per tenant. That's what DynamicSupervisor is for. It starts children on demand via start_child/2, and its only supported strategy is :one_for_one.

{:ok, _} =
  DynamicSupervisor.start_link(
    name: MyApp.WorkerSup,
    strategy: :one_for_one,
    max_children: 100
  )

# Spin up a worker per incoming connection:
{:ok, pid} = DynamicSupervisor.start_child(MyApp.WorkerSup, {MyWorker, conn})
# may also return {:error, :max_children}

start_child/2 takes the same child-spec shapes you already know — a module, a {module, arg} tuple, or a full map. So everything from the earlier sections still applies to each individual child.

Two things differ from a static Supervisor, and both trip people. First, the :id field is still required by the spec type but is completely ignored and need not be unique. If you're coming from Supervisor, where duplicate :ids are fatal, this feels wrong — under DynamicSupervisor it's irrelevant, because the supervisor identifies children by pid. Second, :max_children is back-pressure. It defaults to :infinity, and when you set a real ceiling and exceed it, start_child/2 returns {:error, :max_children} instead of starting a process.

Treat that error as an expected outcome under load, not a crash. A full pool returning {:error, :max_children} is a normal signal to shed load or queue the request, so handle it in the caller; don't let it bubble up as an exception. And do set a ceiling — the :infinity default means a runaway caller can spawn unbounded processes and exhaust the node. Anything driven by external input needs a real number there.dynamic

Module-based DynamicSupervisor: managing and observing the pool

For a long-lived pool you'll usually want a module-based DynamicSupervisor rather than a bare start_link. You define init/1 to return DynamicSupervisor.init(opts), then manage the population with terminate_child/2 and observe it with count_children/1.

@impl true
def init(_arg), do: DynamicSupervisor.init(strategy: :one_for_one)

# Introspect the pool:
DynamicSupervisor.count_children(MyApp.WorkerSup)
# => %{active: 3, specs: 3, supervisors: 0, workers: 3}

# Tear down a specific child by pid:
DynamicSupervisor.terminate_child(MyApp.WorkerSup, pid)
# => :ok | {:error, :not_found}

count_children/1 returns a map of specs, active, supervisors, and workers. That's genuinely useful — wire it into a health endpoint, emit it as a load metric, or assert on it in a test to prove the pool actually drained. terminate_child/2 takes a pid and returns :ok, or {:error, :not_found} if that child is already gone.

Because terminate_child/2 keys off a pid, you need to track the pids that start_child/2 hands back to know what to tear down later. A Registry or an ETS table mapping some domain key to the live pid is the usual pattern.

The {:error, :not_found} return is common under races, and worth handling calmly. The child may crash or finish on its own between the moment you grab its pid and the moment you call terminate_child/2. The child is already gone, which is the outcome you were after. Don't log it loudly and don't branch on it as if something went wrong; the goal was an absent process, and you have one.dynamic

The whole supervision tree fits inside a child-spec map and a handful of flags, and every one of those keys is a decision about what your system does at its worst moment. The real design question is where in the tree you want a given failure to stop climbing. Restart intensity is the part that says a supervisor gives up and escalates, and I think that escalation is the most underused design tool in OTP, because deciding the right altitude for a failure to surface is genuinely hard to reason about.

So before you paste the next children list, decide deliberately at which level of the tree each class of failure should stop climbing, and set the strategy and intensity at that level to make it stop there.

References:

Reading the Supervision Tree Contract, Key by Key

A child spec is a map, and you should know the shape

Running two of the same worker: `Supervisor.child_spec/2`

The three strategies are three blast radii

Restart type decides whether a child comes back at all

Restart intensity is the circuit breaker

Shutdown: graceful drain vs. brutal kill

When the child list isn't fixed: `DynamicSupervisor`

Module-based DynamicSupervisor: managing and observing the pool

What do you think of what I said?

Further reading

A child spec is a map, and you should know the shape

Running two of the same worker: Supervisor.child_spec/2

The three strategies are three blast radii

Restart type decides whether a child comes back at all

Restart intensity is the circuit breaker

Shutdown: graceful drain vs. brutal kill

When the child list isn't fixed: DynamicSupervisor

Module-based DynamicSupervisor: managing and observing the pool

What do you think of what I said?

Further reading

Running two of the same worker: `Supervisor.child_spec/2`

When the child list isn't fixed: `DynamicSupervisor`