Busflow Docs

Internal documentation portal

Skip to content

ADR-028: GDPR TTL retention per entity + legal-hold override + UTC-schedule policy ​

Status: 🟑 Proposed β€” pending Legal/DPO sign-off on the per-entity table + the communications.messages separation-of-concerns position Impacts: gdpr-strategy.md Β§4, docs/schemas/schema-commerce.md, docs/schemas/schema-backoffice.md, apps/api/migrations/2026xxxx_gdpr_ttl.sql, apps/api/src/workers/pii-backfill.worker.ts, apps/api/src/workers/cron-health.worker.ts, new runbook docs/protocols/legal-hold-runbook.md


Context ​

The Level-2 spec described GDPR data scrubbing as a uniform 3-year pg_cron sweep over passengers and cascaded children. Three problems with that framing surfaced in the architect loop:

  1. Tax law conflicts with GDPR uniformity. German GoBD (Β§ 147 AO, Β§ 14b UStG) mandates 10-year retention on tax-relevant artefacts (invoices, receipts, payment ledgers). A blanket 3-year wipe on invoices.recipient_snapshot breaches tax law.
  2. Idempotency cannot rely on text sentinels. A guard like first_name <> '[REDACTED]' fails when a real traveller's legal first name is [REDACTED]. Low-probability, non-zero, and indistinguishable when it happens.
  3. Text-level references to PII in communications.messages.rendered_content are a Legal/DPO policy question, not a purely technical one. We need a written position.

Plus: no DDL existed for the backoffice.legal_holds table the scrub functions needed to read.

Decision ​

  1. Per-entity retention windows. Legal-blessed:

    EntityWindowTrigger columnLegal basis
    commerce.passengers3 yearslast_booking_atBDSG Β§ 35
    backoffice.resellers2 yearslast_active_atBDSG Β§ 35
    commerce.invoices (PII in recipient_snapshot)10 yearsissued_atGoBD Β§ 147 AO, Β§ 14b UStG
    Chat transcripts (Loki log streams)14 dayslog rotation30-day GDPR window
  2. Idempotency via a dedicated column. passengers.pii_redacted_at TIMESTAMPTZ NULL. Scrub functions guard on pii_redacted_at IS NULL. NULL means "not yet redacted"; a set timestamp is the proof-of-redaction.

  3. Tombstone rows, do not delete. payments.refund_passenger_id retains the UUID after scrub; only PII columns are wiped. Hasura is granted read-only permissions on tombstoned rows for financial reconciliation.

  4. Online back-fill for last_booking_at. A NestJS worker (pii-backfill.worker.ts) chunks 10 000 rows with a 1 s delay. The column stays NULLABLE β€” NULL means "unknown, skip in scrub".

  5. Legal-hold override. backoffice.legal_holds (id UUID PRIMARY KEY, tenant_id UUID, passenger_id UUID NULL, reason TEXT, until TIMESTAMPTZ NULL, created_by UUID, created_at TIMESTAMPTZ DEFAULT NOW()). Each per-entity redaction function reads this table first. A subject on an active hold is skipped and logged as SKIPPED_LEGAL_HOLD in tenant_scrub_logs. Operations runbook: docs/protocols/legal-hold-runbook.md.

  6. UTC schedules + local reporting. All pg_cron schedules are UTC. tenant_scrub_logs.scrubbed_at is UTC. Operator-local reports derive display time from tenants.tz at render. Auditors don't need to reconcile two timezones.

  7. Append-only audit table. tenant_scrub_logs is append-only; REVOKE UPDATE, DELETE from every role except postgres. Records UUID + entity type + timestamp + skip reason. Never the redacted data.

  8. Cron-health probe. A NestJS worker (cron-health.worker.ts) runs at 03:30 UTC (30 min after the scrub), queries cron.job_run_details WHERE jobname LIKE 'gdpr_%' via an admin role, and posts failures to Slack #ops β€” because Ubicloud may not expose cron.job_run_details to external Prometheus scrapers.

  9. FILLFACTOR = 80 on passengers and resellers, applied directly in CREATE TABLE DDL (greenfield). Nightly VACUUM ANALYZE at 02:00 UTC; pg_stat_user_tables.n_dead_tup monitored in Mimir.

  10. communications.messages position (pending DPO sign-off): the table stays in Postgres as canonical conversation record. Rendered names ("Dear Jane, your booking…") are treated as derived data. Once the source commerce.passengers row is redacted, rendered_content contains an orphaned name reference that can no longer re-identify anyone without the source row. The Loki 14-day TTL covers any PII that leaks into log streams. If DPO rejects this position, a gated Stage 9 cascading text-replace (passenger β†’ booking β†’ conversation linkage) is added as a follow-up decision.

  11. Ubicloud pg_cron dependency: Ubicloud confirms pg_cron on standard-2; shared_preload_libraries ships with pg_cron. No support ticket required.

Consequences ​

Positive:

  • GoBD compliance is explicit, not an assumption in a comment.
  • Scrub idempotency is proof-based (pii_redacted_at) not text-based.
  • Legal-hold policy is enforceable in SQL, not in an eng process.

Negative:

  • Legal/DPO sign-off is a real-world gate before Stage 5 (cron schedules go live).
  • A rejected communications.messages position forces a Stage 9 cascading text-replace; the architecture is drafted but not built.

Neutral:

  • Retention windows are data-controller-configurable per tenant in a follow-up ADR if operators demand per-tenant variation.

Internal documentation β€” Busflow