Hardening checklist¶
Run through this list before shipping Ampora to production. Each item is a single action; each links to the page that explains why and how.
Transport¶
- TLS on the public hostname (cert-manager + ACME, or your CA). Renew automation in place.
- HSTS sent by the reverse proxy (
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload). -
OpAmp:RequireMtls=true. After bootstrap, only mTLS connections accepted. -
OpAmp:BootstrapPlaintextAllowed=false(the default). - WebSocket-friendly proxy timeouts — read/send ≥ 1 hour (HA wire-up).
Authentication¶
- OIDC issuer pinned to your production IdP — never a dev Keycloak.
- OIDC client secret rotated since you cloned the example configs. Stored in the secret manager.
- Role mapping sourced from IdP groups, not from a per-user attribute (group changes propagate, attribute changes get stuck in caches).
- First-user-becomes-admin bootstrap consumed and the user has been replaced or downgraded if appropriate.
Secrets¶
-
KeyProtection:MasterKeypopulated from the secret manager, never from a literal in source control. - The placeholder
Secretfromdeploy/kustomize/base/secret.yamlhas been replaced, not edited in place — theampora.io/placeholder=trueannotation is gone. - Postgres password rotated since clone. TLS to Postgres (
SSL Mode=Require). - OpenTelemetry export header (if vendor needs auth) lives in the Secret, not in the ConfigMap.
PKI¶
- Active signing key is the one you meant to issue from. If you imported an external CA, retire the auto-bootstrapped one.
- Trust bundle has been distributed to agents — they pick it up on next bootstrap; confirm by spot-checking a few agents.
- CRL Distribution Point and OCSP responder URL point at externally reachable URLs (so revocation actually propagates).
- HSM / KMS adapter configured if you have one (HSM/KMS).
RBAC¶
- At least two
Adminaccounts (don't lock yourself out). - Day-to-day work happens as
Operator, notAdmin. -
Viewerrole exists for read-only stakeholders. - Audit-log toggle for archived events is gated behind
Admin— this is the default but verify after any RBAC customisation.
Multi-tenancy (only if applicable)¶
-
MultiTenant:Mode=HardIsolationif you co-locate multiple customers / business units. - PostgreSQL RLS enabled (
row_security = onat the DB level). - OIDC tenant-discriminator claim verified — log in as a test user in each tenant, confirm the tenant in the user menu.
Data¶
- Postgres backup — daily fulls, continuous WAL archiving (Backup & restore).
- Master key backup in the secret manager, tested by recovery drill.
- Audit retention set to the window your compliance regime requires (Audit retention).
Network¶
- NetworkPolicy restricts egress to the intentional set (Postgres, OIDC, OTel collector, Git host, KMS, federation peers).
- No service exposed externally other than the reverse proxy. Prometheus scrape happens cluster-internally only.
- No
0.0.0.0/0ingress rules anywhere on the agent path — agent connectivity should be allow-listed by region or VPN if your fleet allows it.
Operational hygiene¶
- Self-observability wired up (Self-observability). The Dashboard shows non-zero counters.
- Alerts configured on the headline metrics (Self-observability → alerts).
- Runbook for incident response in place; Ampora's Troubleshooting pages linked from your internal wiki.
- Quarterly DR drill scheduled (Disaster recovery).
Application-side¶
-
Debug:AllowRolloutEndpoints=false(the default — confirm). -
Include Error Detail=falsein the Postgres connection string for production. - Logging level is
Information, notDebug— debug logs leak payloads. - Custom policies that block destructive operations are published and required (default-deny exporter swap, default-deny OTLP-without-TLS, etc.).
Supply chain¶
- Container image signed with cosign (the project's CI does this).
- Deployment pins by digest, not just tag.
- SBOM archived from the release artefacts.
- Container scanning (Trivy or equivalent) gates the deployment.
What to do not do¶
- Do not disable mTLS to make agent troubleshooting easier in production.
- Do not put bootstrap tokens in a shared password manager — they are single-use; the audit trail of "who issued, who redeemed" matters.
- Do not export the OIDC client secret to an env var on developer laptops.
- Do not leave
Debug:AllowRolloutEndpoints=trueon in production — it is for the integration test suite.