Skip to content

Database problems

Ampora's persistence is PostgreSQL. When something here breaks, it breaks loudly — /health/ready flips red and every UI page errors.

1. Cannot connect

Check the obvious:

  • Postgres host resolvable from the pod / VM? nc -z -v db.acme.io 5432.
  • Connection string up to date? Username, password, database, SSL setting. The most common production error is SSL Mode=Require;Trust Server Certificate=true left over from development — Trust Server Certificate=true defeats validation.
  • Postgres max_connections not exceeded? See SELECT * FROM pg_stat_activity WHERE state != 'idle'; on the DB.

The pod logs show the Npgsql exception message verbatim — read it literally. "Password authentication failed", "no pg_hba.conf entry", "could not connect to server" each have one obvious cause.

2. Cannot migrate

Symptoms: pod logs show Microsoft.EntityFrameworkCore.Migrations.HistoryRepository: Applying... and then an error.

Error Cause
permission denied for schema public The ampora role does not own DDL. Run GRANT ALL ON DATABASE ampora TO ampora;
relation "..." already exists Manual schema modification has drifted from EF migrations. Restore from backup or drop the offending object
pg_advisory_lock(...) blocked Two pods trying to migrate concurrently — the advisory lock works, but if the holder crashed leaving the lock, restart Postgres or wait for the connection to reset
out of memory during a JSONB rewrite Bump Postgres work_mem for the migration; see release notes for the migration's memory profile

EF migrations run in a transaction. A failed migration leaves the schema at the previous successful one — there is no half-migrated state to clean up.

3. Slow queries

Check ampora_db_query_duration_seconds_bucket (p95). If it is over 100 ms consistently:

  • Top offenders: EXPLAIN ANALYZE from pg_stat_statements.
  • Common culprits: GIN index missing on a JSONB column you started querying, missing index on a new column added in a migration that did not declare one, table bloat needing a VACUUM ANALYZE.
  • Connection pool exhaustion: if Npgsql cannot get a connection it queues, and the duration includes the wait. MaxPoolSize in the connection string is the knob.

4. Connection pool exhausted

Npgsql.NpgsqlException: The connection pool has been exhausted.

Three possible causes:

  1. Too many app instances per Postgres max_connections. With 5 Ampora pods at default MaxPoolSize=100, you need 500 connections on the DB. Reduce per-pod MaxPoolSize or scale the DB.
  2. Connections leaked. Rare but possible if a custom hosted service forgot to dispose. Open an issue with the leaked-connection stack trace from the logs.
  3. Long-running queries holding connections. See step 3.

5. Replica lag (when using read replicas)

If you wire ConnectionStrings:AmporaRead to a replica:

  • The replica's data lags the primary by some milliseconds-to-seconds.
  • Dashboard reads on the replica may show "just-yesterday" values for a brief window after a write.
  • For that reason, the rollout state machine and audit writes always go to the primary.

If you see "just published a config but it does not appear in the list", check replica lag with SELECT now() - pg_last_xact_replay_timestamp(); on the replica.

6. Disk full

PostgreSQL stops accepting writes when disk is exhausted. The applications gets disk full errors on every write.

  • Free space (du, pg_total_relation_size(...) to find the largest tables).
  • Bump retention if audit_event_archive is the largest.
  • Bump WAL volume size if WAL is the largest.

7. Locked / wedged

A long-running transaction can lock a row that the live application needs:

SELECT
  blocked_locks.pid AS blocked_pid,
  blocked_activity.usename AS blocked_user,
  blocking_locks.pid AS blocking_pid,
  blocking_activity.usename AS blocking_user,
  blocked_activity.query AS blocked_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
  ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.pid != blocked_locks.pid
  AND blocking_locks.granted
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

The query that holds the lock is shown. If it is an Ampora query that should not be long-running, capture the trace and open an issue. If it is some external session (a psql left open), terminate it.

8. Restoring a backup that does not match the running version

If you restore a Postgres dump from version 1.4.x and run Ampora 1.5.x, EF migrations apply the missing migrations on startup. The reverse — 1.5.x dump on 1.4.x binary — is not supported: the binary cannot read the newer schema. Either upgrade the binary or restore from an older dump.

9. RLS lock-out (multi-tenant Hard Isolation)

If you accidentally connect with a non-tenant-scoped role and run a query that the RLS policies would refuse, you get an empty result (rather than a permission error). This can mask "data is missing" issues.

  • Confirm with SHOW ROLE.
  • Confirm with SELECT current_setting('app.tenant_id').
  • The Ampora pod always sets app.tenant_id per request via SET LOCAL. An out-of-band psql session does not, so RLS hides every row.