Skip to content

Hardening checklist

Run through this list before shipping Ampora to production. Each item is a single action; each links to the page that explains why and how.

Transport

  • TLS on the public hostname (cert-manager + ACME, or your CA). Renew automation in place.
  • HSTS sent by the reverse proxy (Strict-Transport-Security: max-age=31536000; includeSubDomains; preload).
  • OpAmp:RequireMtls=true. After bootstrap, only mTLS connections accepted.
  • OpAmp:BootstrapPlaintextAllowed=false (the default).
  • WebSocket-friendly proxy timeouts — read/send ≥ 1 hour (HA wire-up).

Authentication

  • OIDC issuer pinned to your production IdP — never a dev Keycloak.
  • OIDC client secret rotated since you cloned the example configs. Stored in the secret manager.
  • Role mapping sourced from IdP groups, not from a per-user attribute (group changes propagate, attribute changes get stuck in caches).
  • First-user-becomes-admin bootstrap consumed and the user has been replaced or downgraded if appropriate.

Secrets

  • KeyProtection:MasterKey populated from the secret manager, never from a literal in source control.
  • The placeholder Secret from deploy/kustomize/base/secret.yaml has been replaced, not edited in place — the ampora.io/placeholder=true annotation is gone.
  • Postgres password rotated since clone. TLS to Postgres (SSL Mode=Require).
  • OpenTelemetry export header (if vendor needs auth) lives in the Secret, not in the ConfigMap.

PKI

  • Active signing key is the one you meant to issue from. If you imported an external CA, retire the auto-bootstrapped one.
  • Trust bundle has been distributed to agents — they pick it up on next bootstrap; confirm by spot-checking a few agents.
  • CRL Distribution Point and OCSP responder URL point at externally reachable URLs (so revocation actually propagates).
  • HSM / KMS adapter configured if you have one (HSM/KMS).

RBAC

  • At least two Admin accounts (don't lock yourself out).
  • Day-to-day work happens as Operator, not Admin.
  • Viewer role exists for read-only stakeholders.
  • Audit-log toggle for archived events is gated behind Admin — this is the default but verify after any RBAC customisation.

Multi-tenancy (only if applicable)

  • MultiTenant:Mode=HardIsolation if you co-locate multiple customers / business units.
  • PostgreSQL RLS enabled (row_security = on at the DB level).
  • OIDC tenant-discriminator claim verified — log in as a test user in each tenant, confirm the tenant in the user menu.

Data

  • Postgres backup — daily fulls, continuous WAL archiving (Backup & restore).
  • Master key backup in the secret manager, tested by recovery drill.
  • Audit retention set to the window your compliance regime requires (Audit retention).

Network

  • NetworkPolicy restricts egress to the intentional set (Postgres, OIDC, OTel collector, Git host, KMS, federation peers).
  • No service exposed externally other than the reverse proxy. Prometheus scrape happens cluster-internally only.
  • No 0.0.0.0/0 ingress rules anywhere on the agent path — agent connectivity should be allow-listed by region or VPN if your fleet allows it.

Operational hygiene

Application-side

  • Debug:AllowRolloutEndpoints=false (the default — confirm).
  • Include Error Detail=false in the Postgres connection string for production.
  • Logging level is Information, not Debug — debug logs leak payloads.
  • Custom policies that block destructive operations are published and required (default-deny exporter swap, default-deny OTLP-without-TLS, etc.).

Supply chain

  • Container image signed with cosign (the project's CI does this).
  • Deployment pins by digest, not just tag.
  • SBOM archived from the release artefacts.
  • Container scanning (Trivy or equivalent) gates the deployment.

What to do not do

  • Do not disable mTLS to make agent troubleshooting easier in production.
  • Do not put bootstrap tokens in a shared password manager — they are single-use; the audit trail of "who issued, who redeemed" matters.
  • Do not export the OIDC client secret to an env var on developer laptops.
  • Do not leave Debug:AllowRolloutEndpoints=true on in production — it is for the integration test suite.