Busflow Docs

Internal documentation portal

Skip to content

ADR-029: Secrets & Encryption — Swarm Secrets + pgsodium AEAD + .env retirement

Status: 🟡 Proposed — pending Legal sign-off on determinism note + architect sign-off on secret-rotation runbook Impacts: docs/architecture/secrets-management.md (new), .github/workflows/deploy.yml, .github/workflows/secrets-sync.yml (new), apps/api/migrations/2026xxxx_tenant_credentials_encryption.sql (new), apps/api/src/config/SecretsLoader.ts (new), Hasura permissions


Context

Today, production infrastructure credentials flow through a plaintext .env.production file that the deploy.yml workflow writes to the Swarm manager over SSH. Tenant-level credentials (payment provider keys, WhatsApp tokens) are stored as plain text columns in Postgres and selected over Hasura with no masking layer. Both patterns would fail a security review:

  1. .env files on the manager are readable by anyone with SSH access and appear in Docker inspect output for every service that loads them.
  2. Tenant credentials stored as plain text are exfiltration-complete via a single Hasura read-permission bug.

We need a layered model: Swarm Secrets for infra creds, pgsodium for tenant creds, and a masking view so Hasura never exposes raw ciphertext or plaintext.

Options Evaluated

#OptionProsCons
1HashiCorp Vault clusterIndustry standardAnother stateful cluster to operate; overkill at our scale
2AWS Secrets ManagerManagedRequires AWS integration; data leaves Hetzner
3Swarm Secrets (infra) + pgsodium AEAD (tenant data) + Hasura masking view (chosen)No new vendor; Swarm handles rotation; masked reads safe to exposeDeterministic AEAD gotcha for suffix indexing

Decision

Option 3. Layered policy:

Layer A — Infrastructure Secrets

  • All infra credentials (DB URIs, provider API keys, webhook secrets) are Swarm Secrets (docker secret create …). Injected into containers via /run/secrets/<name>.
  • .env.production is retired from deploy.yml. The deploy workflow assembles Secrets from GitHub encrypted secrets and docker service update --secret-add.
  • A dedicated secrets-sync.yml workflow reads Terraform outputs (e.g. Ubicloud URIs) and syncs them into Swarm Secrets + GitHub secrets, closing the loop automatically.

Layer B — Tenant Credentials

  • Encrypted at the database boundary via pgsodium AEAD with a per-tenant key derived from a master key ID stored in pgsodium.key.
  • Deterministic AEAD is required (not randomized) so we can support suffix indexing / partial-match lookups like credentials WHERE provider = 'mollie' AND suffix_key(api_key_enc) = 'M0ll'. Legal must sign off on the determinism note in this ADR since deterministic AEAD has weaker cryptographic properties vs. randomized.
  • A Hasura masking view exposes only ****${SUFFIX} strings (last 4 chars) to tenant admins; full plaintext is available only to a privileged tenant_admin_raw role used by a NestJS service account, never by a Hasura user role.

Layer C — Rotation & Drift

  • docs/protocols/secrets-rotation-runbook.md codifies: quarterly rotation for provider keys; incident rotation (compromise); drift detection via a scheduled diff between Swarm Secrets and the Terraform-expected set.
  • Rollback: every Swarm Secret is versioned; docker service update --secret-rm <old> --secret-add <new> is the only sanctioned flip mechanism.

Consequences

Positive:

  • .env plaintext is gone from production.
  • Tenant credentials are protected at rest even against a Postgres read credential; Hasura cannot leak plaintext even with a misconfigured permission.
  • Rotation is workflow-driven, not a memorised operator procedure.

Negative:

  • Deterministic AEAD is a deliberate cryptographic compromise (weaker than randomized) justified by the suffix-indexing requirement. Legal must sign off.
  • Adding pgsodium to the migration chain is a shared_preload_libraries change. Ubicloud supports it out of the box on standard-2; MVP containerized Postgres requires a one-time image rebuild.

Neutral:

  • Swarm Secret rotation cadence is documented in the runbook and carries no breaking-API implication (the mount path is stable).

Internal documentation — Busflow