Pause, resume, abort, rollback¶
The four manual controls every rollout has. Understand the difference; the wrong choice during an incident can make recovery harder.
Pause¶
Effect: stops the rollout from advancing. In-flight pushes complete; new pushes do not start.
State: Paused.
Use when:
- you want to investigate an anomaly without committing one way or the other,
- a related system is in maintenance and you do not want to add load,
- you want a colleague's eyes on the change before continuing.
A paused rollout is reversible — Resume continues forward.
Resume¶
Effect: re-enters InProgress from the next agent in the current step.
Use when:
- the cause of the pause is understood and accepted,
- a gate fired but the failures are explainable (stale agents, one-off network blip).
If you Resume after a gate fire and the gate fires again, the rollout re-pauses with the same data point. Resume is not "ignore the gate"; it is "take one more step and re-evaluate."
Abort¶
Effect: stops the rollout permanently. Pushed agents stay on the new config; not-yet-pushed agents stay on whatever they had before.
State: Failed.
Use when:
- you accept the partial state ("the canary cohort is actually fine on the new version; we just changed our mind about rolling further"),
- the change is benign but your release window has closed.
Abort is not a rollback. The agents that already applied stay applied. There is no automatic restore.
Rollback¶
Effect: re-assigns the previous Published version to every agent the rollout has touched. Untouched agents are unaffected. The rollback runs as its own internal mini-rollout, using the same strategy / gates of the original rollout.
State: RolledBack.
Use when:
- the change is at fault — apply failures, downstream alerting fires, data shape regression,
- the pause was because of the change, not external circumstance.
Rollback is the safe default when in doubt. It is fully audited and reversible (you can roll forward again afterward).
Quick decision tree¶
Did the change cause the problem?
├── Yes → Rollback
├── No, but I want to stop here → Abort
└── No, I want to keep going → Resume
└── If unsure → Pause longer
Cancelling a rollback¶
Rollbacks themselves can be paused / aborted / advanced like any rollout. Cancelling a rollback leaves the fleet in a mixed state — some on the new version, some on the old. Use cautiously.
What is logged¶
Every of these actions writes an audit event with:
- the actor,
- the rollout ID,
- the rollout state before and after,
- the count of agents in each subset (
pushed,pending,rolled back).
The event is the source of truth for "who pushed the button at 3:42 a.m.".