Database problems¶
Ampora's persistence is PostgreSQL. When something here breaks, it breaks loudly — /health/ready flips red and every UI page errors.
1. Cannot connect¶
Check the obvious:
- Postgres host resolvable from the pod / VM?
nc -z -v db.acme.io 5432. - Connection string up to date? Username, password, database, SSL setting. The most common production error is
SSL Mode=Require;Trust Server Certificate=trueleft over from development —Trust Server Certificate=truedefeats validation. - Postgres
max_connectionsnot exceeded? SeeSELECT * FROM pg_stat_activity WHERE state != 'idle';on the DB.
The pod logs show the Npgsql exception message verbatim — read it literally. "Password authentication failed", "no pg_hba.conf entry", "could not connect to server" each have one obvious cause.
2. Cannot migrate¶
Symptoms: pod logs show Microsoft.EntityFrameworkCore.Migrations.HistoryRepository: Applying... and then an error.
| Error | Cause |
|---|---|
permission denied for schema public | The ampora role does not own DDL. Run GRANT ALL ON DATABASE ampora TO ampora; |
relation "..." already exists | Manual schema modification has drifted from EF migrations. Restore from backup or drop the offending object |
pg_advisory_lock(...) blocked | Two pods trying to migrate concurrently — the advisory lock works, but if the holder crashed leaving the lock, restart Postgres or wait for the connection to reset |
out of memory during a JSONB rewrite | Bump Postgres work_mem for the migration; see release notes for the migration's memory profile |
EF migrations run in a transaction. A failed migration leaves the schema at the previous successful one — there is no half-migrated state to clean up.
3. Slow queries¶
Check ampora_db_query_duration_seconds_bucket (p95). If it is over 100 ms consistently:
- Top offenders:
EXPLAIN ANALYZEfrompg_stat_statements. - Common culprits: GIN index missing on a JSONB column you started querying, missing index on a new column added in a migration that did not declare one, table bloat needing a
VACUUM ANALYZE. - Connection pool exhaustion: if Npgsql cannot get a connection it queues, and the duration includes the wait.
MaxPoolSizein the connection string is the knob.
4. Connection pool exhausted¶
Npgsql.NpgsqlException: The connection pool has been exhausted.
Three possible causes:
- Too many app instances per Postgres
max_connections. With 5 Ampora pods at defaultMaxPoolSize=100, you need 500 connections on the DB. Reduce per-podMaxPoolSizeor scale the DB. - Connections leaked. Rare but possible if a custom hosted service forgot to dispose. Open an issue with the leaked-connection stack trace from the logs.
- Long-running queries holding connections. See step 3.
5. Replica lag (when using read replicas)¶
If you wire ConnectionStrings:AmporaRead to a replica:
- The replica's data lags the primary by some milliseconds-to-seconds.
- Dashboard reads on the replica may show "just-yesterday" values for a brief window after a write.
- For that reason, the rollout state machine and audit writes always go to the primary.
If you see "just published a config but it does not appear in the list", check replica lag with SELECT now() - pg_last_xact_replay_timestamp(); on the replica.
6. Disk full¶
PostgreSQL stops accepting writes when disk is exhausted. The applications gets disk full errors on every write.
- Free space (
du,pg_total_relation_size(...)to find the largest tables). - Bump retention if
audit_event_archiveis the largest. - Bump WAL volume size if WAL is the largest.
7. Locked / wedged¶
A long-running transaction can lock a row that the live application needs:
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.pid != blocked_locks.pid
AND blocking_locks.granted
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
The query that holds the lock is shown. If it is an Ampora query that should not be long-running, capture the trace and open an issue. If it is some external session (a psql left open), terminate it.
8. Restoring a backup that does not match the running version¶
If you restore a Postgres dump from version 1.4.x and run Ampora 1.5.x, EF migrations apply the missing migrations on startup. The reverse — 1.5.x dump on 1.4.x binary — is not supported: the binary cannot read the newer schema. Either upgrade the binary or restore from an older dump.
9. RLS lock-out (multi-tenant Hard Isolation)¶
If you accidentally connect with a non-tenant-scoped role and run a query that the RLS policies would refuse, you get an empty result (rather than a permission error). This can mask "data is missing" issues.
- Confirm with
SHOW ROLE. - Confirm with
SELECT current_setting('app.tenant_id'). - The Ampora pod always sets
app.tenant_idper request viaSET LOCAL. An out-of-bandpsqlsession does not, so RLS hides every row.