Operations¶

Day-2 operations once Ampora is installed and configured. Each page is self-contained — you should be able to read just the one runbook you need.

Page	When you need it
High availability	Running multiple Ampora instances behind a load balancer
Scaling out	Past a few hundred agents per instance
Backup & restore	Disaster preparation (and recovery)
Self-observability	Wiring up Ampora's own metrics, traces, logs
Upgrades	Rolling new versions, downgrade policy, breaking changes
Disaster recovery	RTO/RPO planning, full recovery procedure
Audit retention	Tuning hot/archive windows for compliance

On-call cheat sheet¶

When something is broken at 3am, in this order:

/health/live — is the process alive?
/health/ready — is it ready to take traffic? If not, read the response body — it lists the probes that failed.
Pod / container logs — Ampora logs are structured JSON; pipe through jq for human reading.
Postgres — is the DB reachable? kubectl -n ampora exec deploy/ampora-web -- nc -z db.acme.svc 5432.
The audit log itself — every incident leaves prints in the audit table. audit_events filtered by EntityType is often the fastest way to ground a "what happened?" question.

If you have not yet set up self-observability, do so before anything else: graphs and traces beat console-grepping every time.