ADR-022: Ubicloud Managed PostgreSQL Cutover โ
Status: ๐ก Proposed โ pending Finance sign-off (cost envelope) + Legal/DPO sign-off (EU residency clause) Impacts:
infrastructure.mdยง1, ยง6.2;gdpr-strategy.mdยง1;terraform/modules/postgres-ubicloud/;terraform/environments/production/;docs/protocols/postgres-cutover.md; deploy.yml + secrets-sync.yml
Context โ
The MVP runs PostgreSQL (with pgvector) as a containerized service on the Swarm manager node, backed by a Hetzner Cloud Volume. This is operationally simple but carries two unbounded risks at scale:
- Single-manager SPOF for the stateful tier. Volume-backed containers on the manager node cannot be rescheduled without manual Hetzner API intervention; a failed manager takes Postgres with it.
- Operator fatigue on backups/PITR. Nightly
pg_dumpplus 14 daily + 4 weekly rotations inside the Swarm works at low tenant count but does not scale to a 99.5 % uptime SLA with a tenant-hour recovery objective.
Ubicloud Managed PostgreSQL runs on Hetzner bare metal in eu-central-h1 (Falkenstein), offers standard-2 HA-async, automated PITR (7-day retention), and exposes a Terraform provider (ubicloud/ubicloud ~> 0.3). It is the smallest operational footprint that lifts the stateful tier off the Swarm without introducing a US-or-hyperscaler dependency that would break GDPR residency.
The trigger threshold at which we pay the cutover cost: tenant count > 50 OR SLA contracts require > 99.5 % โ whichever comes first.
Options Evaluated โ
| # | Option | Pros | Cons |
|---|---|---|---|
| 1 | Stay on containerized Postgres indefinitely | Zero migration cost | Unbounded operational risk past ~50 tenants; SLA ceiling ~99 % |
| 2 | Self-managed Postgres on a dedicated Hetzner VM with streaming replication | Full control, no vendor lock-in | We become the DBA team; PITR + failover drift into our on-call surface |
| 3 | Ubicloud Managed Postgres on Hetzner bare metal (chosen) | EU residency preserved; PITR + HA handled by provider; Terraform provider available; pg_cron/pgsodium/pgvector supported on standard-2 | New Terraform provider dependency (~v0.3); 3-day parallel-run cost during cutover window |
| 4 | AWS RDS Postgres | Mature, battle-tested | Data leaves the EU unless explicitly regioned, and adds a hyperscaler bill we cannot justify pre-Series A |
Decision โ
Option 3 โ adopt Ubicloud Managed Postgres via the ubicloud/ubicloud Terraform provider, co-located in fsn1/eu-central-h1.
- A new Terraform module
terraform/modules/postgres-ubicloudwrapsubicloud_postgres.primary,ubicloud_postgres_firewall_rule.swarm, andubicloud_postgres_metric_destination.grafana. - The
swarmmodule already exportsswarm_cidrs(frommodule.network). These feedvar.allowed_cidrsโ not a hard-coded10.0.0.0/16. - The module is instantiated behind
var.enable_ubicloud_postgres = falseinterraform/environments/production/main.tf. Flipping the flag provisions the instance; Commit 5 of the rollout wires the outputs (connection_uri_writer,connection_uri_reader,ca_certificate,instance_fqdn) into Swarm Secrets viasecrets-sync.yml. - The cutover runbook (
docs/protocols/postgres-cutover.md) is the single operational authority for the flip. Writer + reader Swarm Secrets update in onedocker service update --secret-rm โฆ --secret-add โฆcall to prevent phantom-row reads during the transition. pg_cronis confirmed available onstandard-2out of the box โ no pre-flight support ticket.- Scope exclusions during cutover: Redis (BullMQ broker), MinIO/Object Storage, and the Hasura JWT secret are NOT rotated.
- 30-day pre-cutover dump retention: the final pre-cutover
pg_dumplives in Hetzner Object Storage for 30 days (not the usual 14) to cover the Ubicloud PITR dark-zone for any transaction landed immediately before the secret flip. - Terraform concurrency guard: production applies run via
workflow_dispatchwithconcurrency: { group: terraform-prod, cancel-in-progress: false }.
Consequences โ
Positive:
- Stateful tier moves off the Swarm โ the manager becomes a stateless control plane again.
- PITR + HA replication are handled by Ubicloud; our on-call page decreases.
- EU residency is cited by clause; DPIA audit trail is cleaner.
pg_cron/pgsodium/pgvectorremain available for GDPR scrub functions (ADR-028) and LLM workloads.
Negative:
- New Terraform provider at
~> 0.3โ minor-version churn risk. Mitigated by pinning and by reading release notes before anyterraform apply. - 72-hour parallel-run during cutover adds a Finance line item (monthly Ubicloud tier ร 3/30); requires a Finance sign-off checkbox in this ADR.
fsn1/eu-central-h1co-location coupling: a Falkenstein regional outage now takes both the Swarm and the DB offline. Accepted โ cross-region expansion is deferred past Series A.
Neutral:
- Rollback criterion (p99 API latency > 2ร baseline for >10 min in the 72 h window) is straightforward; the runbook flips the Secrets back to the containerized endpoint.
- Observability integration:
ubicloud_postgres_metric_destination.grafanaships directly to the Mimir endpoint fromobservability.mdโ no pgbouncer-to-exporter bridge required unless Ubicloud restricts directprometheus-postgres-exporterconnections (pending verification).
Finance sign-off โ
[ ] Monthly Ubicloud tier price for standard-2 HA: โฌ/mo (to be confirmed with Ubicloud sales before any production apply) [ ] Approved 72 h parallel-run cost (= monthly ร 3/30): โฌ [ ] Signed: __________________________ Date: ____________
Legal/DPO sign-off โ
[ ] Ubicloud EU-residency SLA ยง(EU residency) clause attached as appendix: _____________ [ ] gdpr-strategy.md ยง1 cites the clause: ____________ [ ] Signed: __________________________ Date: ____________