ADR-029: Secrets & Encryption — Swarm Secrets + pgsodium AEAD + .env retirement
Status: 🟡 Proposed — pending Legal sign-off on determinism note + architect sign-off on secret-rotation runbook Impacts:
docs/architecture/secrets-management.md(new),.github/workflows/deploy.yml,.github/workflows/secrets-sync.yml(new),apps/api/migrations/2026xxxx_tenant_credentials_encryption.sql(new),apps/api/src/config/SecretsLoader.ts(new), Hasura permissions
Context
Today, production infrastructure credentials flow through a plaintext .env.production file that the deploy.yml workflow writes to the Swarm manager over SSH. Tenant-level credentials (payment provider keys, WhatsApp tokens) are stored as plain text columns in Postgres and selected over Hasura with no masking layer. Both patterns would fail a security review:
.envfiles on the manager are readable by anyone with SSH access and appear in Docker inspect output for every service that loads them.- Tenant credentials stored as plain text are exfiltration-complete via a single Hasura read-permission bug.
We need a layered model: Swarm Secrets for infra creds, pgsodium for tenant creds, and a masking view so Hasura never exposes raw ciphertext or plaintext.
Options Evaluated
| # | Option | Pros | Cons |
|---|---|---|---|
| 1 | HashiCorp Vault cluster | Industry standard | Another stateful cluster to operate; overkill at our scale |
| 2 | AWS Secrets Manager | Managed | Requires AWS integration; data leaves Hetzner |
| 3 | Swarm Secrets (infra) + pgsodium AEAD (tenant data) + Hasura masking view (chosen) | No new vendor; Swarm handles rotation; masked reads safe to expose | Deterministic AEAD gotcha for suffix indexing |
Decision
Option 3. Layered policy:
Layer A — Infrastructure Secrets
- All infra credentials (DB URIs, provider API keys, webhook secrets) are Swarm Secrets (
docker secret create …). Injected into containers via/run/secrets/<name>. .env.productionis retired fromdeploy.yml. The deploy workflow assembles Secrets from GitHub encrypted secrets anddocker service update --secret-add.- A dedicated
secrets-sync.ymlworkflow reads Terraform outputs (e.g. Ubicloud URIs) and syncs them into Swarm Secrets + GitHub secrets, closing the loop automatically.
Layer B — Tenant Credentials
- Encrypted at the database boundary via
pgsodiumAEAD with a per-tenant key derived from a master key ID stored inpgsodium.key. - Deterministic AEAD is required (not randomized) so we can support suffix indexing / partial-match lookups like
credentials WHERE provider = 'mollie' AND suffix_key(api_key_enc) = 'M0ll'. Legal must sign off on the determinism note in this ADR since deterministic AEAD has weaker cryptographic properties vs. randomized. - A Hasura masking view exposes only
****${SUFFIX}strings (last 4 chars) to tenant admins; full plaintext is available only to a privilegedtenant_admin_rawrole used by a NestJS service account, never by a Hasura user role.
Layer C — Rotation & Drift
docs/protocols/secrets-rotation-runbook.mdcodifies: quarterly rotation for provider keys; incident rotation (compromise); drift detection via a scheduled diff between Swarm Secrets and the Terraform-expected set.- Rollback: every Swarm Secret is versioned;
docker service update --secret-rm <old> --secret-add <new>is the only sanctioned flip mechanism.
Consequences
Positive:
.envplaintext is gone from production.- Tenant credentials are protected at rest even against a Postgres read credential; Hasura cannot leak plaintext even with a misconfigured permission.
- Rotation is workflow-driven, not a memorised operator procedure.
Negative:
- Deterministic AEAD is a deliberate cryptographic compromise (weaker than randomized) justified by the suffix-indexing requirement. Legal must sign off.
- Adding
pgsodiumto the migration chain is ashared_preload_librarieschange. Ubicloud supports it out of the box onstandard-2; MVP containerized Postgres requires a one-time image rebuild.
Neutral:
- Swarm Secret rotation cadence is documented in the runbook and carries no breaking-API implication (the mount path is stable).