Skip to content

Operations

Day-2 operations once Ampora is installed and configured. Each page is self-contained — you should be able to read just the one runbook you need.

Page When you need it
High availability Running multiple Ampora instances behind a load balancer
Scaling out Past a few hundred agents per instance
Backup & restore Disaster preparation (and recovery)
Self-observability Wiring up Ampora's own metrics, traces, logs
Upgrades Rolling new versions, downgrade policy, breaking changes
Disaster recovery RTO/RPO planning, full recovery procedure
Audit retention Tuning hot/archive windows for compliance

On-call cheat sheet

When something is broken at 3am, in this order:

  1. /health/live — is the process alive?
  2. /health/ready — is it ready to take traffic? If not, read the response body — it lists the probes that failed.
  3. Pod / container logs — Ampora logs are structured JSON; pipe through jq for human reading.
  4. Postgres — is the DB reachable? kubectl -n ampora exec deploy/ampora-web -- nc -z db.acme.svc 5432.
  5. The audit log itself — every incident leaves prints in the audit table. audit_events filtered by EntityType is often the fastest way to ground a "what happened?" question.

If you have not yet set up self-observability, do so before anything else: graphs and traces beat console-grepping every time.