Skip to main content

Production Checklist

Before launching pgStack in production, review the following checklist.

Security

Secrets

  • POSTGRES_PASSWORD is a strong random value generated with openssl rand -base64 32
  • JWT_SECRET is 32+ characters, generated randomly
  • JWT_SECRET does not contain dev-only — with PGSTACK_ENV=production (set by docker-compose.prod.yml) the proxy refuses to start if it does; outside production it only warns
  • ANON_KEY is a strong random value
  • SERVICE_ROLE_KEY is a strong random value — treat it like a database root password
  • .env file is in .gitignore and never committed to version control
  • SITE_URL is set to your production URL (required for OAuth redirects and email links)
  • No development defaults (dev-only-*) remain in your production environment

Network

  • PostgreSQL port 5432 is NOT exposed to the internet (only accessible within Docker network)
  • pgStack proxy port 8080 is behind a reverse proxy (Caddy, nginx, Traefik)
  • HTTPS is enforced — HTTP redirects to HTTPS
  • ALLOWED_ORIGINS is set to your specific domain(s), not empty
  • ALLOW_PERMISSIVE_CORS is false or unset (strict CORS by default since 2026-05)
  • Studio (/studio/) is restricted to trusted IPs or protected with additional auth
  • Configure TLS_MODE=auto for automatic HTTPS, or use a reverse proxy with TLS_MODE=off (default)
  • TRUSTED_PROXIES is set to the IP(s) of your reverse proxy — prevents rate limit bypass via X-Forwarded-For spoofing
  • HSTS is automatically added when TLS is active (via TLS_MODE=auto or TLS_MODE=manual)
  • WebSocket connections validate Origin header against ALLOWED_ORIGINS
  • REQUIRE_AUTHENTICATED_WS=true — required in production: the PGSTACK_ENV=production validator refuses to start with false, and docker-compose.prod.yml sets it to true (rejects anon-key WS subscriptions; requires JWT)

Database

  • POSTGRES_HOST_AUTH_METHOD is scram-sha-256 (not trust)
  • ENABLE_SQL_ENDPOINT is false
  • ENABLE_DEMO_ENDPOINT is false or unset
  • SECRETS_ENCRYPTION_KEY is set to a strong random 32-byte value if any of these are in use: edge function env vars, webhook signing secrets. Without it, CreateWebhook refuses non-empty secrets and existing encrypted rows are dropped from the delivery cache (fail-closed).
  • Existing deployments have run migrations/20260515210000_hash_tokens_encrypt_envvars.sql (refresh + email tokens are stored as SHA-256 hashes; env vars are AES-256-GCM encrypted) and migrations/20260516120000_encrypt_webhook_secrets.sql (webhook signing secrets moved from plaintext to secret_encrypted — webhooks must be re-created with new secrets after the migration)
  • RLS is enabled on all user-facing tables
  • anon and authenticated roles have only the minimum necessary permissions
  • Internal auth tables (pgstack.refresh_tokens, pgstack.oauth_sessions, pgstack.email_tokens) have RLS enabled

Reliability

Backups

  • Automated daily backups configured (pg_dump to S3, Backblaze B2, etc.)
  • Backup restoration tested at least once
  • Backup retention policy defined (e.g., keep 7 daily, 4 weekly)

Health checks

The production compose file includes health checks. Verify they work:

docker compose -f docker-compose.prod.yml ps
# Both services should show "healthy"

# Proxy health endpoint
curl https://your-app.example.com/health
# {"status":"ok","pg_connected":true}

Container restart policy

The production compose file sets restart: unless-stopped. Verify containers automatically restart after a reboot:

# Reboot test
sudo reboot
# After reboot:
docker compose -f docker-compose.prod.yml ps

Subscription recovery after a PostgreSQL restart

Live-query state lives in shared memory and is wiped by every PostgreSQL restart (crash, docker compose restart, host reboot, image update). The extension keeps a durable catalog — pgr.persisted_subscriptions, a LOGGED table maintained automatically by pgr.subscribe / pgr.unsubscribe — which is replayed into shared memory by pgr.restore_subscriptions().

This is automatic. Both the all-in-one image and the two-container pgstack-postgres image run the restore on every boot (their entrypoint waits for PostgreSQL to accept connections, then calls pgr.restore_subscriptions() for each database with the extension installed, before the stack starts serving). You do not need a manual restore step.

If you run PostgreSQL with a custom entrypoint that does not include this step, replay it yourself after every restart:

docker compose -f docker-compose.prod.yml exec db \
psql -U postgres -d app -c "SELECT pgr.restore_subscriptions();"

The function returns the number of subscriptions restored and has EXECUTE revoked from PUBLIC (superuser or an explicitly granted role only).

Resource limits

  • Memory limits are appropriate for your workload (see deploy.resources.limits)
  • max_connections in PostgreSQL matches your expected connection count
  • pg_reactive.max_subscriptions is appropriate for your WebSocket client count

Performance

PostgreSQL configuration

For a server with 4 GB RAM:

command: >
postgres
-c shared_preload_libraries=pg_reactive
-c shared_buffers=1GB
-c effective_cache_size=3GB
-c work_mem=16MB
-c maintenance_work_mem=256MB
-c max_connections=200
-c pg_reactive.max_subscriptions=2048
-c log_min_messages=warning
-c log_statement=none
-c random_page_cost=1.1
-c checkpoint_completion_target=0.9
-c wal_buffers=64MB

Indexes

Ensure all frequently queried columns are indexed, especially columns used in RLS policies and live query filters:

-- Index columns used in RLS policies
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_todos_user_id ON todos(user_id);

-- Index columns used in live query WHERE clauses
CREATE INDEX idx_orders_status ON orders(status) WHERE status != 'completed';

Connection pooling

For high traffic, add PgBouncer in front of PostgreSQL:

# Add to docker-compose.prod.yml
pgbouncer:
image: bitnami/pgbouncer:latest
environment:
PGBOUNCER_DATABASE: app
POSTGRESQL_HOST: db
POSTGRESQL_USERNAME: postgres
POSTGRESQL_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRESQL_DATABASE: app
PGBOUNCER_POOL_MODE: transaction
PGBOUNCER_MAX_CLIENT_CONN: 1000
PGBOUNCER_DEFAULT_POOL_SIZE: 20
ports:
- "127.0.0.1:6432:6432"

Then point DATABASE_URL at PgBouncer instead of PostgreSQL directly.

Tune the proxy's built-in connection pool before reaching for PgBouncer:

  • PG_POOL_MAX_CONNS (default 50) — raise for high-concurrency workloads
  • PG_POOL_MIN_CONNS (default 5) — keeps connections warm to avoid cold-start latency

For advanced pooling (transaction mode), add PgBouncer between the proxy and PostgreSQL as shown above.

Proxy configuration

  • Compression — Gzip compression is enabled by default (GZIP_LEVEL=5). Set to 0 to disable if your reverse proxy handles compression.
  • Timeouts — Adjust READ_TIMEOUT (default 30s) and WRITE_TIMEOUT (default 60s) based on expected query complexity. Long-running queries may need higher write timeouts.

Monitoring

Log aggregation

Forward Docker logs to a log aggregation service:

# In docker-compose.prod.yml, add to each service:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

Or use the fluentd or loki Docker log drivers to ship logs to your logging stack.

Metrics

pgStack has no built-in /metrics endpoint. /health (status + pg_connected) is the only built-in observability endpoint; scrape it or use Docker/PostgreSQL exporters for metrics.

Alerting

Set up alerts for:

  • Container health check failures
  • PostgreSQL disk usage > 80%
  • Slow queries (> 1 second average)
  • Error rate spikes in proxy logs

Maintenance

PostgreSQL VACUUM

PostgreSQL autovacuum handles routine maintenance automatically. For tables with high churn (frequent inserts/updates/deletes), monitor bloat:

SELECT
relname AS table,
n_dead_tup AS dead_rows,
n_live_tup AS live_rows,
round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

If bloat is significant, run VACUUM manually during low-traffic periods:

VACUUM ANALYZE orders;

pg_reactive snapshot tables

Live query snapshot tables (pgr._snap_*) are UNLOGGED — ordinary on-disk tables that skip WAL. They are created on subscribe and dropped on unsubscribe (or LRU eviction / DDL auto-unsubscribe) and require no maintenance.

Upgrade procedure

Apply schema migrations to the database before the new proxy starts — a new proxy may depend on schema or roles a migration creates, and an existing data volume never re-runs docker/init.sql. Bring up the database alone first, then migrate, then swap the rest:

  1. Review the changelog for breaking changes
  2. Test the upgrade in a staging environment
  3. Take a database backup
  4. Pull new images: docker compose -f docker-compose.prod.yml pull
  5. Start only the database on the new image: docker compose -f docker-compose.prod.yml up -d db
  6. Run new migrations against it: pgstack migration run
  7. Apply any one-time role/credential steps the changelog calls out (see below)
  8. Bring up the rest: docker compose -f docker-compose.prod.yml up -d
  9. Verify the health endpoint and spot-check API responses

One-time: upgrading to the non-superuser proxy connection

Releases that move the proxy off the PostgreSQL superuser onto the authenticator role (gemini audit P2) need an extra step on an existing volume — init.sql and docker/20-authenticator-login.sh create and credential that role only at first init, so they never fire on a populated volume:

# After step 6 (migration 20260613120000 created the authenticator role NOLOGIN
# plus the service_role grants), give it the login password the proxy will use.
# It MUST equal AUTHENTICATOR_PASSWORD in your .env.
docker compose -f docker-compose.prod.yml exec -T db \
psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" \
-v apw="$AUTHENTICATOR_PASSWORD" -c "ALTER ROLE authenticator LOGIN PASSWORD :'apw'"

Only then bring up the proxy (step 8) — its DATABASE_URL now points at authenticator, which finally exists and can log in. Skipping this leaves the proxy unable to connect.

Two-login-role split (admin plane)

To also split the admin plane onto its own login role (so a data-plane compromise cannot reach service_role — gemini audit finding 6), after the above and after migration 20260613140000 created authenticator_admin:

# 1. Give the admin role a login password (= AUTHENTICATOR_ADMIN_PASSWORD in .env)
docker compose -f docker-compose.prod.yml exec -T db \
psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" \
-v apw="$AUTHENTICATOR_ADMIN_PASSWORD" -c "ALTER ROLE authenticator_admin LOGIN PASSWORD :'apw'"
# 2. Set ADMIN_DATABASE_URL=postgres://authenticator_admin:<pw>@db:5432/<db> in .env,
# then recreate the proxy so its admin pool uses it.
# 3. ONLY THEN drop the data role's escalation path:
docker compose -f docker-compose.prod.yml exec -T db \
psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "REVOKE service_role FROM authenticator"

Doing step 3 before step 2 breaks the admin pool (it would still be assuming service_role through authenticator).