Backup & restore¶
Ampora's persistent state is PostgreSQL plus a single application-level master key. Back both up — and test restoring from them — and you can recover from anything short of a region-wide datacentre loss.
What to back up¶
| Item | Where | Why |
|---|---|---|
| PostgreSQL | your DB host or managed service | The only authoritative state for fleet, configurations, audit |
| Master encryption key | your secret manager (Vault, AWS Secrets Manager, …) | Required to decrypt CA private keys, peer secrets, GitOps credentials at rest |
| Server TLS cert + key | your secret manager | Convenience — without it, browsers will see a cert mismatch on restore |
| Container images | your registry | Convenience — same as above; the version that signed your data must be re-pullable |
You do not need to back up:
- the Ampora deployment manifests (they are in Git),
- the configmaps (they are in Git),
- per-pod ephemeral state (
/tmp, the EF cache).
The audit log lives in PostgreSQL. Treat backup retention with the same seriousness as audit retention.
Recommended backup strategy¶
PostgreSQL¶
- Continuous WAL archiving to an off-cluster object store (
pgBackRest,wal-g, or the managed offering's PITR feature). - Daily full backup retained for at least 30 days.
- Pre-upgrade snapshot before every Ampora version upgrade — even if EF migrations are forward-only-safe, the snapshot is your "abort" button.
The dataset is small for most fleets (a few GB even for 10 000 agents), so daily fulls are cheap.
Master key¶
The KeyProtection:MasterKey value is a 32-byte CSPRNG key. Treat it like a root password:
- store in your secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, AKV),
- replicate cross-region if your secret manager supports it,
- two-person retrieval policy if you can,
- include it in your DR drill — losing the key means encrypted-at-rest fields are forever unreadable, even with a perfect Postgres backup.
Server TLS¶
If you use cert-manager and ACME, the cert is reproducible from a recovered cluster — back up the certificate-issuer config, not the cert. If you use a corporate CA with manual issuance, back up the signed bundle.
Restore procedure¶
Same database, point in time¶
The easiest case: rollback within the WAL retention window using your managed offering's PITR or pgBackRest restore --target-time. Ampora will reconnect to the rewound database on the next reconnect attempt without any extra steps.
Different database, same data¶
# 1. Provision a fresh Postgres
createdb ampora -O ampora
# 2. Restore the dump
pg_restore --no-owner --role=ampora -d ampora ampora-2026-04-01.dump
# 3. Point Ampora at the new DB
kubectl -n ampora set env deploy/ampora-web \
ConnectionStrings__Ampora="Host=newdb.acme.io;Database=ampora;..."
# 4. Restart so EF picks up any pending migrations
kubectl -n ampora rollout restart deploy/ampora-web
Full disaster — new cluster, restored DB¶
The shopping list, in order:
- Provision a new Kubernetes cluster (or VM, for binary deployments).
- Restore PostgreSQL from your latest known-good backup.
- Pull the master key from your secret manager into the new cluster's secret manager (or the
Secretfor the binary deployment). - Re-issue server TLS for the public hostname.
kubectl apply -k deploy/kustomize/overlays/...against the new cluster.- Ampora connects, validates the master key against existing encrypted-at-rest blobs, and is fully functional.
The first connecting agent will reattempt mTLS using its existing client cert — that cert is signed by your persisted CA, which lives in the restored DB. It still validates, the agent reconnects, and the fleet is back. The DR test is "do agents reconnect on a fresh cluster?" and the answer should be yes.
Failure modes and how to recover¶
| Symptom | Likely cause | Recovery |
|---|---|---|
| App starts but every encrypted-at-rest field reads garbled | Wrong master key | Point KeyProtection:MasterKey at the correct value; restart |
| App starts; agents can connect but not via mTLS | Old persisted CA was lost | Bootstrap a new CA, push new client certs to all agents (rolling rebootstrap) |
| App starts; audit log is empty | Restored an older snapshot | Restore from a newer backup or accept the loss |
| Agent reconnects but Effective Config is missing | Agent's state row was rewound past the last config push | The next OpAMP heartbeat refreshes Effective Config — wait it out |
DR drill¶
We strongly recommend a quarterly DR drill:
- Take a fresh backup.
- Spin up a parallel Ampora in a sandbox namespace pointing at a restored copy of the backup.
- Confirm: agents from the test environment can connect, configs are intact, audit log timestamps line up, and you can push a rollout.
- Tear it down.
Drills are how you discover that the master key in the secret manager is one rotation behind the master key actually in use, before you need to discover it on the worst day of your year.