GDPR TTL Retention — Migration Plan
Status: 📋 Ready to apply once the target schemas (
commerce.passengers,backoffice.resellers,commerce.invoices) exist. Scope: Concrete DDL + function bodies +pg_cronschedules implementing ADR-028. Why this file exists: This plan was originally drafted as a Hasura migration (apps/hasura/migrations/default/20260420120000_gdpr_ttl_retention/). The migration was removed because the backoffice database is still empty and the target tables do not exist yet — a migration applied today would fail on everyALTER TABLEreferenced below. This document preserves the concrete SQL so nothing is lost when the schema lands.
Preconditions
commerce.passengersexists with at least(id UUID, tenant_id UUID, first_name TEXT, last_name TEXT, email TEXT, phone TEXT).backoffice.resellersexists with at least(id UUID, tenant_id UUID, contact_name TEXT, contact_email TEXT, contact_phone TEXT).commerce.invoicesexists with at least(id UUID, tenant_id UUID, recipient_snapshot JSONB).- Postgres 15+ with
shared_preload_libraries = 'pg_cron,pgsodium'. On Ubicloudstandard-2both extensions are pre-enabled (confirmed in ADR-022 §Consequences). backoffice.*schema exists and the migration role canCREATEwithin it.
If any of these are not true, land them as prerequisite migrations first — do not weaken the DDL below to work around missing columns.
Ordering Contract
The migration must be applied top-to-bottom in exactly this order. Later sections read tables/columns created by earlier sections.
- Extensions
- Audit ledger (
backoffice.tenant_scrub_logs) — must exist before scrub functions canINSERT - Legal-hold table (
backoffice.legal_holds) — must exist before scrub functions canSELECT - Idempotency columns on PII entities (
pii_redacted_at+ per-entity "last activity" column) - Scrub functions (per entity)
pg_cronschedulesfillfactortuning
1. Extensions
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS pg_cron;pgcrypto provides gen_random_uuid(). pg_cron provides the scheduler.
2. Append-Only Audit Ledger
tenant_scrub_logs is the only proof-of-action surface for auditors/DPO. It must reject UPDATE and DELETE at the role level — a superuser slip then requires an explicit GRANT, not just a typo.
CREATE TABLE IF NOT EXISTS backoffice.tenant_scrub_logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
entity_type TEXT NOT NULL CHECK (entity_type IN ('passenger','reseller','invoice','conversation')),
entity_id UUID NOT NULL,
scrubbed_at TIMESTAMPTZ NOT NULL DEFAULT now() AT TIME ZONE 'UTC',
action TEXT NOT NULL CHECK (action IN ('REDACTED','SKIPPED_LEGAL_HOLD','SKIPPED_NULL_TRIGGER')),
skip_reason TEXT,
job_run_id UUID
);
COMMENT ON TABLE backoffice.tenant_scrub_logs IS
'Append-only GDPR scrub ledger. Does NOT store redacted values — ADR-028.';
CREATE INDEX IF NOT EXISTS idx_scrub_logs_tenant_time
ON backoffice.tenant_scrub_logs (tenant_id, scrubbed_at DESC);
REVOKE UPDATE, DELETE ON backoffice.tenant_scrub_logs FROM PUBLIC;Why a dedicated job_run_id: one scheduled run may produce thousands of log rows; correlating them back to a single cron.job_run_details record is done via this UUID.
3. Legal-Hold Table
Rows in legal_holds pin a specific entity against redaction. The partial index keeps open-hold lookups O(1) regardless of historical volume.
CREATE TABLE IF NOT EXISTS backoffice.legal_holds (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
entity_type TEXT NOT NULL CHECK (entity_type IN ('passenger','reseller','invoice','conversation')),
entity_id UUID NOT NULL,
reason TEXT NOT NULL,
until TIMESTAMPTZ,
created_by UUID NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now() AT TIME ZONE 'UTC',
closed_at TIMESTAMPTZ,
closed_by UUID
);
CREATE INDEX IF NOT EXISTS idx_legal_holds_open_by_entity
ON backoffice.legal_holds (tenant_id, entity_type, entity_id)
WHERE closed_at IS NULL;Operational procedures live in docs/protocols/legal-hold-runbook.md.
4. Idempotency Columns on PII Entities
Every scrub-eligible entity gets two additions:
- A trigger column (
last_booking_at,last_active_at,issued_at) — nullable forever;NULLmeans "unknown, skip in scrub". - A proof-of-redaction column (
pii_redacted_at TIMESTAMPTZ) —NULLmeans "not yet redacted". This replaces the fragile text-sentinel pattern (first_name = '[REDACTED]') that fails against real travellers whose legal names happen to match the sentinel. See ADR-028 §Decision #2.
ALTER TABLE commerce.passengers
ADD COLUMN IF NOT EXISTS last_booking_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS pii_redacted_at TIMESTAMPTZ;
ALTER TABLE backoffice.resellers
ADD COLUMN IF NOT EXISTS last_active_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS pii_redacted_at TIMESTAMPTZ;
ALTER TABLE commerce.invoices
ADD COLUMN IF NOT EXISTS issued_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS pii_redacted_at TIMESTAMPTZ;The online back-fill for last_booking_at is handled by apps/api/src/workers/pii-backfill.worker.ts (to be written) in 10 000-row chunks with a 1 s delay — see ADR-028 §Decision #4.
5. Scrub Functions
Every scrub function follows the same CTE-structured pipeline:
candidates— rows where the trigger column crosses the per-entity retention window andpii_redacted_at IS NULL.held— subset of candidates with an openlegal_holdsrow. These are not redacted.redact—UPDATEthat tombstones PII columns (set toNULLor sentinel JSONB) and stampspii_redacted_at = now().logged_redact/logged_hold— dualINSERTintotenant_scrub_logsso both actions (REDACTEDandSKIPPED_LEGAL_HOLD) are recorded in the same transaction.
All functions are SECURITY DEFINER so they can write to tenant_scrub_logs without granting each scheduled cron role broad INSERT rights on the audit table.
5.1 scrub_passengers — 3-year window (BDSG § 35)
CREATE OR REPLACE FUNCTION backoffice.scrub_passengers(p_run_id UUID DEFAULT gen_random_uuid())
RETURNS INTEGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
v_rows INTEGER := 0;
BEGIN
WITH candidates AS (
SELECT p.id, p.tenant_id
FROM commerce.passengers p
WHERE p.pii_redacted_at IS NULL
AND p.last_booking_at IS NOT NULL
AND p.last_booking_at < (now() AT TIME ZONE 'UTC') - INTERVAL '3 years'
),
held AS (
SELECT c.id, c.tenant_id
FROM candidates c
JOIN backoffice.legal_holds h
ON h.tenant_id = c.tenant_id
AND h.entity_type = 'passenger'
AND h.entity_id = c.id
AND h.closed_at IS NULL
),
redact AS (
UPDATE commerce.passengers p
SET first_name = NULL,
last_name = NULL,
email = NULL,
phone = NULL,
pii_redacted_at = now() AT TIME ZONE 'UTC'
FROM candidates c
WHERE p.id = c.id
AND NOT EXISTS (SELECT 1 FROM held h WHERE h.id = c.id)
RETURNING p.id, p.tenant_id
),
logged_redact AS (
INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, job_run_id)
SELECT tenant_id, 'passenger', id, 'REDACTED', p_run_id FROM redact
RETURNING 1
),
logged_hold AS (
INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, skip_reason, job_run_id)
SELECT tenant_id, 'passenger', id, 'SKIPPED_LEGAL_HOLD', 'active_hold', p_run_id FROM held
RETURNING 1
)
SELECT COUNT(*) INTO v_rows FROM logged_redact;
RETURN v_rows;
END;
$$;5.2 scrub_resellers — 2-year window (BDSG § 35)
CREATE OR REPLACE FUNCTION backoffice.scrub_resellers(p_run_id UUID DEFAULT gen_random_uuid())
RETURNS INTEGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE v_rows INTEGER := 0;
BEGIN
WITH candidates AS (
SELECT r.id, r.tenant_id
FROM backoffice.resellers r
WHERE r.pii_redacted_at IS NULL
AND r.last_active_at IS NOT NULL
AND r.last_active_at < (now() AT TIME ZONE 'UTC') - INTERVAL '2 years'
),
held AS (
SELECT c.id, c.tenant_id FROM candidates c
JOIN backoffice.legal_holds h
ON h.tenant_id = c.tenant_id AND h.entity_type='reseller'
AND h.entity_id = c.id AND h.closed_at IS NULL
),
redact AS (
UPDATE backoffice.resellers r
SET contact_name = NULL, contact_email = NULL, contact_phone = NULL,
pii_redacted_at = now() AT TIME ZONE 'UTC'
FROM candidates c
WHERE r.id = c.id AND NOT EXISTS (SELECT 1 FROM held h WHERE h.id = c.id)
RETURNING r.id, r.tenant_id
)
INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, job_run_id)
SELECT tenant_id, 'reseller', id, 'REDACTED', p_run_id FROM redact;
GET DIAGNOSTICS v_rows = ROW_COUNT;
RETURN v_rows;
END;
$$;5.3 scrub_invoices — 10-year window (GoBD § 147 AO / § 14b UStG)
Invoice PII is embedded in recipient_snapshot (JSONB). Redaction replaces the whole object with a sentinel rather than NULL-ing individual keys — auditors need to see that the row was processed, not that it was empty.
CREATE OR REPLACE FUNCTION backoffice.scrub_invoices(p_run_id UUID DEFAULT gen_random_uuid())
RETURNS INTEGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE v_rows INTEGER := 0;
BEGIN
WITH candidates AS (
SELECT i.id, i.tenant_id
FROM commerce.invoices i
WHERE i.pii_redacted_at IS NULL
AND i.issued_at IS NOT NULL
AND i.issued_at < (now() AT TIME ZONE 'UTC') - INTERVAL '10 years'
),
held AS (
SELECT c.id, c.tenant_id FROM candidates c
JOIN backoffice.legal_holds h
ON h.tenant_id=c.tenant_id AND h.entity_type='invoice'
AND h.entity_id=c.id AND h.closed_at IS NULL
),
redact AS (
UPDATE commerce.invoices i
SET recipient_snapshot = jsonb_build_object('redacted', true),
pii_redacted_at = now() AT TIME ZONE 'UTC'
FROM candidates c
WHERE i.id = c.id AND NOT EXISTS (SELECT 1 FROM held h WHERE h.id = c.id)
RETURNING i.id, i.tenant_id
)
INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, job_run_id)
SELECT tenant_id, 'invoice', id, 'REDACTED', p_run_id FROM redact;
GET DIAGNOSTICS v_rows = ROW_COUNT;
RETURN v_rows;
END;
$$;5.4 Conversations / communications.messages
Deliberately not implemented as a database scrub. Per ADR-028 §Decision #10, rendered names in communications.messages.rendered_content are treated as derived from commerce.passengers: once the source passenger is scrubbed, the rendered message contains an orphaned name that can no longer be re-identified. Chat transcripts in Loki log streams expire under the 14-day compactor retention (ADR-027). If the DPO rejects this position, the follow-up work is a cascading text-replace function gated on a new ADR — do not add it here.
6. pg_cron Schedules
All schedules are UTC. Off-peak for DACH; 15-minute offsets prevent three scrub functions from contending for the same buffer pool. 03:00 UTC aligns with the backup-verify cadence in backup-verify-runbook.md.
SELECT cron.schedule('busflow_scrub_passengers', '0 3 * * *', $$SELECT backoffice.scrub_passengers();$$);
SELECT cron.schedule('busflow_scrub_resellers', '15 3 * * *', $$SELECT backoffice.scrub_resellers();$$);
SELECT cron.schedule('busflow_scrub_invoices', '30 3 * * *', $$SELECT backoffice.scrub_invoices();$$);The NestJS cron-health.worker.ts (ADR-028 §Decision #8) runs at 03:30 UTC and posts failures from cron.job_run_details to Slack #ops — Ubicloud does not expose cron.job_run_details to external Prometheus scrapers.
7. fillfactor Tuning
The scrub functions tombstone many rows per night, which churns HOT update chains. A greenfield fillfactor = 80 leaves 20 % free space per page for HOT updates, cutting index bloat. Set at migration time — not later — because ALTER TABLE ... SET (fillfactor) on a populated table doesn't retroactively rewrite pages.
ALTER TABLE commerce.passengers SET (fillfactor = 80);
ALTER TABLE backoffice.resellers SET (fillfactor = 80);
ALTER TABLE commerce.invoices SET (fillfactor = 80);Down Migration
SELECT cron.unschedule('busflow_scrub_passengers');
SELECT cron.unschedule('busflow_scrub_resellers');
SELECT cron.unschedule('busflow_scrub_invoices');
DROP FUNCTION IF EXISTS backoffice.scrub_invoices(UUID);
DROP FUNCTION IF EXISTS backoffice.scrub_resellers(UUID);
DROP FUNCTION IF EXISTS backoffice.scrub_passengers(UUID);
ALTER TABLE commerce.invoices DROP COLUMN IF EXISTS pii_redacted_at;
ALTER TABLE commerce.invoices DROP COLUMN IF EXISTS issued_at;
ALTER TABLE backoffice.resellers DROP COLUMN IF EXISTS pii_redacted_at;
ALTER TABLE backoffice.resellers DROP COLUMN IF EXISTS last_active_at;
ALTER TABLE commerce.passengers DROP COLUMN IF EXISTS pii_redacted_at;
ALTER TABLE commerce.passengers DROP COLUMN IF EXISTS last_booking_at;
DROP INDEX IF EXISTS backoffice.idx_legal_holds_open_by_entity;
DROP TABLE IF EXISTS backoffice.legal_holds;
DROP INDEX IF EXISTS backoffice.idx_scrub_logs_tenant_time;
DROP TABLE IF EXISTS backoffice.tenant_scrub_logs;The down migration does not restore redacted PII. It exists for structural rollback during migration development only. Any real post-scrub recovery must come from a PITR restore per postgres-cutover.md §PITR.
When to Land This Migration
- After
commerce.passengers,backoffice.resellers, andcommerce.invoicesexist with the columns listed in §Preconditions. - After Legal/DPO has signed off on the retention table and the
communications.messagesseparation-of-concerns position in ADR-028. - Before the first production tenant row is written — the trigger columns need to be in place so
pii_backfill.worker.tscan populatelast_booking_atin-line rather than via a costly table rewrite later.
Related
- ADR-028 — decision record.
gdpr-strategy.md§4 — per-entity retention table.legal-hold-runbook.md— operating procedure for opening/closing holds.- ADR-022 — confirms
pg_cronavailability on Ubicloudstandard-2.