Backup & restore¶

Ampora's persistent state is PostgreSQL plus a single application-level master key. Back both up — and test restoring from them — and you can recover from anything short of a region-wide datacentre loss.

What to back up¶

Item	Where	Why
PostgreSQL	your DB host or managed service	The only authoritative state for fleet, configurations, audit
Master encryption key	your secret manager (Vault, AWS Secrets Manager, …)	Required to decrypt CA private keys, peer secrets, GitOps credentials at rest
Server TLS cert + key	your secret manager	Convenience — without it, browsers will see a cert mismatch on restore
Container images	your registry	Convenience — same as above; the version that signed your data must be re-pullable

You do not need to back up:

the Ampora deployment manifests (they are in Git),
the configmaps (they are in Git),
per-pod ephemeral state (/tmp, the EF cache).

The audit log lives in PostgreSQL. Treat backup retention with the same seriousness as audit retention.

Recommended backup strategy¶

PostgreSQL¶

Continuous WAL archiving to an off-cluster object store (pgBackRest, wal-g, or the managed offering's PITR feature).
Daily full backup retained for at least 30 days.
Pre-upgrade snapshot before every Ampora version upgrade — even if EF migrations are forward-only-safe, the snapshot is your "abort" button.

The dataset is small for most fleets (a few GB even for 10 000 agents), so daily fulls are cheap.

Master key¶

The KeyProtection:MasterKey value is a 32-byte CSPRNG key. Treat it like a root password:

store in your secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, AKV),
replicate cross-region if your secret manager supports it,
two-person retrieval policy if you can,
include it in your DR drill — losing the key means encrypted-at-rest fields are forever unreadable, even with a perfect Postgres backup.

Server TLS¶

If you use cert-manager and ACME, the cert is reproducible from a recovered cluster — back up the certificate-issuer config, not the cert. If you use a corporate CA with manual issuance, back up the signed bundle.

Restore procedure¶

Same database, point in time¶

The easiest case: rollback within the WAL retention window using your managed offering's PITR or pgBackRest restore --target-time. Ampora will reconnect to the rewound database on the next reconnect attempt without any extra steps.

Different database, same data¶

# 1. Provision a fresh Postgres
createdb ampora -O ampora

# 2. Restore the dump
pg_restore --no-owner --role=ampora -d ampora ampora-2026-04-01.dump

# 3. Point Ampora at the new DB
kubectl -n ampora set env deploy/ampora-web \
  ConnectionStrings__Ampora="Host=newdb.acme.io;Database=ampora;..."

# 4. Restart so EF picks up any pending migrations
kubectl -n ampora rollout restart deploy/ampora-web

Full disaster — new cluster, restored DB¶

The shopping list, in order:

Provision a new Kubernetes cluster (or VM, for binary deployments).
Restore PostgreSQL from your latest known-good backup.
Pull the master key from your secret manager into the new cluster's secret manager (or the Secret for the binary deployment).
Re-issue server TLS for the public hostname.
kubectl apply -k deploy/kustomize/overlays/... against the new cluster.
Ampora connects, validates the master key against existing encrypted-at-rest blobs, and is fully functional.

The first connecting agent will reattempt mTLS using its existing client cert — that cert is signed by your persisted CA, which lives in the restored DB. It still validates, the agent reconnects, and the fleet is back. The DR test is "do agents reconnect on a fresh cluster?" and the answer should be yes.

Failure modes and how to recover¶

Symptom	Likely cause	Recovery
App starts but every encrypted-at-rest field reads garbled	Wrong master key	Point `KeyProtection:MasterKey` at the correct value; restart
App starts; agents can connect but not via mTLS	Old persisted CA was lost	Bootstrap a new CA, push new client certs to all agents (rolling rebootstrap)
App starts; audit log is empty	Restored an older snapshot	Restore from a newer backup or accept the loss
Agent reconnects but Effective Config is missing	Agent's state row was rewound past the last config push	The next OpAMP heartbeat refreshes Effective Config — wait it out

DR drill¶

We strongly recommend a quarterly DR drill:

Take a fresh backup.
Spin up a parallel Ampora in a sandbox namespace pointing at a restored copy of the backup.
Confirm: agents from the test environment can connect, configs are intact, audit log timestamps line up, and you can push a rollout.
Tear it down.

Drills are how you discover that the master key in the secret manager is one rotation behind the master key actually in use, before you need to discover it on the worst day of your year.