Production Checklist

Before launching pgStack in production, review the following checklist.

Security

Secrets

POSTGRES_PASSWORD is a strong random value generated with openssl rand -base64 32
JWT_SECRET is 32+ characters, generated randomly
JWT_SECRET does not contain dev-only — with PGSTACK_ENV=production (set by docker-compose.prod.yml) the proxy refuses to start if it does; outside production it only warns
ANON_KEY is a strong random value
SERVICE_ROLE_KEY is a strong random value — treat it like a database root password
.env file is in .gitignore and never committed to version control
SITE_URL is set to your production URL (required for OAuth redirects and email links)
No development defaults (dev-only-*) remain in your production environment

Network

Database

POSTGRES_HOST_AUTH_METHOD is scram-sha-256 (not trust)
ENABLE_SQL_ENDPOINT is false
ENABLE_DEMO_ENDPOINT is false or unset
SECRETS_ENCRYPTION_KEY is set to a strong random 32-byte value if any of these are in use: edge function env vars, webhook signing secrets. Without it, CreateWebhook refuses non-empty secrets and existing encrypted rows are dropped from the delivery cache (fail-closed).
Existing deployments have run migrations/20260515210000_hash_tokens_encrypt_envvars.sql (refresh + email tokens are stored as SHA-256 hashes; env vars are AES-256-GCM encrypted) and migrations/20260516120000_encrypt_webhook_secrets.sql (webhook signing secrets moved from plaintext to secret_encrypted — webhooks must be re-created with new secrets after the migration)
RLS is enabled on all user-facing tables
anon and authenticated roles have only the minimum necessary permissions
Internal auth tables (pgstack.refresh_tokens, pgstack.oauth_sessions, pgstack.email_tokens) have RLS enabled

Reliability

Backups

Automated daily backups configured (pg_dump to S3, Backblaze B2, etc.)
Backup restoration tested at least once
Backup retention policy defined (e.g., keep 7 daily, 4 weekly)

Health checks

The production compose file includes health checks. Verify they work:

docker compose -f docker-compose.prod.yml ps
# Both services should show "healthy"

# Proxy health endpoint
curl https://your-app.example.com/health
# {"status":"ok","pg_connected":true}

Container restart policy

The production compose file sets restart: unless-stopped. Verify containers automatically restart after a reboot:

# Reboot test
sudo reboot
# After reboot:
docker compose -f docker-compose.prod.yml ps

Subscription recovery after a PostgreSQL restart

Live-query state lives in shared memory and is wiped by every PostgreSQL restart (crash, docker compose restart, host reboot, image update). The extension keeps a durable catalog — pgr.persisted_subscriptions, a LOGGED table maintained automatically by pgr.subscribe / pgr.unsubscribe — which is replayed into shared memory by pgr.restore_subscriptions().

This is automatic. Both the all-in-one image and the two-container pgstack-postgres image run the restore on every boot (their entrypoint waits for PostgreSQL to accept connections, then calls pgr.restore_subscriptions() for each database with the extension installed, before the stack starts serving). You do not need a manual restore step.

If you run PostgreSQL with a custom entrypoint that does not include this step, replay it yourself after every restart:

docker compose -f docker-compose.prod.yml exec db \
  psql -U postgres -d app -c "SELECT pgr.restore_subscriptions();"

The function returns the number of subscriptions restored and has EXECUTE revoked from PUBLIC (superuser or an explicitly granted role only).

Resource limits

Memory limits are appropriate for your workload (see deploy.resources.limits)
max_connections in PostgreSQL matches your expected connection count
pg_reactive.max_subscriptions is appropriate for your WebSocket client count

Performance

PostgreSQL configuration

For a server with 4 GB RAM:

command: >
  postgres
    -c shared_preload_libraries=pg_reactive
    -c shared_buffers=1GB
    -c effective_cache_size=3GB
    -c work_mem=16MB
    -c maintenance_work_mem=256MB
    -c max_connections=200
    -c pg_reactive.max_subscriptions=2048
    -c log_min_messages=warning
    -c log_statement=none
    -c random_page_cost=1.1
    -c checkpoint_completion_target=0.9
    -c wal_buffers=64MB

Indexes

Ensure all frequently queried columns are indexed, especially columns used in RLS policies and live query filters:

-- Index columns used in RLS policies
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_todos_user_id ON todos(user_id);

-- Index columns used in live query WHERE clauses
CREATE INDEX idx_orders_status ON orders(status) WHERE status != 'completed';

Connection pooling

For high traffic, add PgBouncer in front of PostgreSQL:

# Add to docker-compose.prod.yml
pgbouncer:
  image: bitnami/pgbouncer:latest
  environment:
    PGBOUNCER_DATABASE: app
    POSTGRESQL_HOST: db
    POSTGRESQL_USERNAME: postgres
    POSTGRESQL_PASSWORD: ${POSTGRES_PASSWORD}
    POSTGRESQL_DATABASE: app
    PGBOUNCER_POOL_MODE: transaction
    PGBOUNCER_MAX_CLIENT_CONN: 1000
    PGBOUNCER_DEFAULT_POOL_SIZE: 20
  ports:
    - "127.0.0.1:6432:6432"

Then point DATABASE_URL at PgBouncer instead of PostgreSQL directly.

Tune the proxy's built-in connection pool before reaching for PgBouncer:

PG_POOL_MAX_CONNS (default 50) — raise for high-concurrency workloads
PG_POOL_MIN_CONNS (default 5) — keeps connections warm to avoid cold-start latency

For advanced pooling (transaction mode), add PgBouncer between the proxy and PostgreSQL as shown above.

Proxy configuration

Compression — Gzip compression is enabled by default (GZIP_LEVEL=5). Set to 0 to disable if your reverse proxy handles compression.
Timeouts — Adjust READ_TIMEOUT (default 30s) and WRITE_TIMEOUT (default 60s) based on expected query complexity. Long-running queries may need higher write timeouts.

Monitoring

Log aggregation

Forward Docker logs to a log aggregation service:

# In docker-compose.prod.yml, add to each service:
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Or use the fluentd or loki Docker log drivers to ship logs to your logging stack.

Metrics

pgStack has no built-in /metrics endpoint. /health (status + pg_connected) is the only built-in observability endpoint; scrape it or use Docker/PostgreSQL exporters for metrics.

Alerting

Set up alerts for:

Container health check failures
PostgreSQL disk usage > 80%
Slow queries (> 1 second average)
Error rate spikes in proxy logs

Maintenance

PostgreSQL VACUUM

PostgreSQL autovacuum handles routine maintenance automatically. For tables with high churn (frequent inserts/updates/deletes), monitor bloat:

SELECT
  relname AS table,
  n_dead_tup AS dead_rows,
  n_live_tup AS live_rows,
  round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

If bloat is significant, run VACUUM manually during low-traffic periods:

VACUUM ANALYZE orders;

pg_reactive snapshot tables

Live query snapshot tables (pgr._snap_*) are UNLOGGED — ordinary on-disk tables that skip WAL. They are created on subscribe and dropped on unsubscribe (or LRU eviction / DDL auto-unsubscribe) and require no maintenance.

Upgrade procedure

Apply schema migrations to the database before the new proxy starts — a new proxy may depend on schema or roles a migration creates, and an existing data volume never re-runs docker/init.sql. Bring up the database alone first, then migrate, then swap the rest:

Review the changelog for breaking changes
Test the upgrade in a staging environment
Take a database backup
Pull new images: docker compose -f docker-compose.prod.yml pull
Start only the database on the new image: docker compose -f docker-compose.prod.yml up -d db
Run new migrations against it: pgstack migration run
Apply any one-time role/credential steps the changelog calls out (see below)
Bring up the rest: docker compose -f docker-compose.prod.yml up -d
Verify the health endpoint and spot-check API responses

One-time: upgrading to the non-superuser proxy connection

Releases that move the proxy off the PostgreSQL superuser onto the authenticator role (gemini audit P2) need an extra step on an existing volume — init.sql and docker/20-authenticator-login.sh create and credential that role only at first init, so they never fire on a populated volume:

# After step 6 (migration 20260613120000 created the authenticator role NOLOGIN
# plus the service_role grants), give it the login password the proxy will use.
# It MUST equal AUTHENTICATOR_PASSWORD in your .env.
docker compose -f docker-compose.prod.yml exec -T db \
  psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" \
  -v apw="$AUTHENTICATOR_PASSWORD" -c "ALTER ROLE authenticator LOGIN PASSWORD :'apw'"

Only then bring up the proxy (step 8) — its DATABASE_URL now points at authenticator, which finally exists and can log in. Skipping this leaves the proxy unable to connect.

To also split the admin plane onto its own login role (so a data-plane compromise cannot reach service_role — gemini audit finding 6), after the above and after migration 20260613140000 created authenticator_admin:

# 1. Give the admin role a login password (= AUTHENTICATOR_ADMIN_PASSWORD in .env)
docker compose -f docker-compose.prod.yml exec -T db \
  psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" \
  -v apw="$AUTHENTICATOR_ADMIN_PASSWORD" -c "ALTER ROLE authenticator_admin LOGIN PASSWORD :'apw'"
# 2. Set ADMIN_DATABASE_URL=postgres://authenticator_admin:<pw>@db:5432/<db> in .env,
#    then recreate the proxy so its admin pool uses it.
# 3. ONLY THEN drop the data role's escalation path:
docker compose -f docker-compose.prod.yml exec -T db \
  psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "REVOKE service_role FROM authenticator"

Doing step 3 before step 2 breaks the admin pool (it would still be assuming service_role through authenticator).

Security​

Secrets​

Network​

Database​

Reliability​

Backups​

Health checks​

Container restart policy​

Subscription recovery after a PostgreSQL restart​

Resource limits​

Performance​

PostgreSQL configuration​

Indexes​

Connection pooling​

Proxy configuration​

Monitoring​

Log aggregation​

Metrics​

Alerting​

Maintenance​

PostgreSQL VACUUM​

pg_reactive snapshot tables​

Upgrade procedure​

One-time: upgrading to the non-superuser proxy connection​

Two-login-role split (admin plane)​