Busflow Docs

Internal documentation portal

Skip to content
Reviewed 02 May 2026

GDPR TTL Retention β€” Migration Plan ​

Status: πŸ“‹ Ready to apply once the target schemas (commerce.passengers, backoffice.resellers, commerce.invoices) exist. Scope: Concrete DDL + function bodies + pg_cron schedules implementing ADR-028. Why this file exists: This plan was originally drafted as a Hasura migration (apps/hasura/migrations/default/20260420120000_gdpr_ttl_retention/). The migration was removed because the backoffice database is still empty and the target tables do not exist yet β€” a migration applied today would fail on every ALTER TABLE referenced below. This document preserves the concrete SQL so nothing is lost when the schema lands.


Preconditions ​

  1. commerce.passengers exists with at least (id UUID, tenant_id UUID, first_name TEXT, last_name TEXT, email TEXT, phone TEXT).
  2. backoffice.resellers exists with at least (id UUID, tenant_id UUID, contact_name TEXT, contact_email TEXT, contact_phone TEXT).
  3. commerce.invoices exists with at least (id UUID, tenant_id UUID, recipient_snapshot JSONB).
  4. Postgres 15+ with shared_preload_libraries = 'pg_cron,pgsodium'. On Ubicloud standard-2 both extensions are pre-enabled (confirmed in ADR-022 Β§Consequences).
  5. backoffice.* schema exists and the migration role can CREATE within it.

If any of these are not true, land them as prerequisite migrations first β€” do not weaken the DDL below to work around missing columns.


Ordering Contract ​

The migration must be applied top-to-bottom in exactly this order. Later sections read tables/columns created by earlier sections.

  1. Extensions
  2. Audit ledger (backoffice.tenant_scrub_logs) β€” must exist before scrub functions can INSERT
  3. Legal-hold table (backoffice.legal_holds) β€” must exist before scrub functions can SELECT
  4. Idempotency columns on PII entities (pii_redacted_at + per-entity "last activity" column)
  5. Scrub functions (per entity)
  6. pg_cron schedules
  7. fillfactor tuning

1. Extensions ​

sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS pg_cron;

pgcrypto provides gen_random_uuid(). pg_cron provides the scheduler.


2. Append-Only Audit Ledger ​

tenant_scrub_logs is the only proof-of-action surface for auditors/DPO. It must reject UPDATE and DELETE at the role level β€” a superuser slip then requires an explicit GRANT, not just a typo.

sql
CREATE TABLE IF NOT EXISTS backoffice.tenant_scrub_logs (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id     UUID NOT NULL,
  entity_type   TEXT NOT NULL CHECK (entity_type IN ('passenger','reseller','invoice','conversation')),
  entity_id     UUID NOT NULL,
  scrubbed_at   TIMESTAMPTZ NOT NULL DEFAULT now() AT TIME ZONE 'UTC',
  action        TEXT NOT NULL CHECK (action IN ('REDACTED','SKIPPED_LEGAL_HOLD','SKIPPED_NULL_TRIGGER')),
  skip_reason   TEXT,
  job_run_id    UUID
);

COMMENT ON TABLE backoffice.tenant_scrub_logs IS
  'Append-only GDPR scrub ledger. Does NOT store redacted values β€” ADR-028.';

CREATE INDEX IF NOT EXISTS idx_scrub_logs_tenant_time
  ON backoffice.tenant_scrub_logs (tenant_id, scrubbed_at DESC);

REVOKE UPDATE, DELETE ON backoffice.tenant_scrub_logs FROM PUBLIC;

Why a dedicated job_run_id: one scheduled run may produce thousands of log rows; correlating them back to a single cron.job_run_details record is done via this UUID.


Rows in legal_holds pin a specific entity against redaction. The partial index keeps open-hold lookups O(1) regardless of historical volume.

sql
CREATE TABLE IF NOT EXISTS backoffice.legal_holds (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID NOT NULL,
  entity_type TEXT NOT NULL CHECK (entity_type IN ('passenger','reseller','invoice','conversation')),
  entity_id   UUID NOT NULL,
  reason      TEXT NOT NULL,
  until       TIMESTAMPTZ,
  created_by  UUID NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now() AT TIME ZONE 'UTC',
  closed_at   TIMESTAMPTZ,
  closed_by   UUID
);

CREATE INDEX IF NOT EXISTS idx_legal_holds_open_by_entity
  ON backoffice.legal_holds (tenant_id, entity_type, entity_id)
  WHERE closed_at IS NULL;

Operational procedures live in docs/protocols/legal-hold-runbook.md.


4. Idempotency Columns on PII Entities ​

Every scrub-eligible entity gets two additions:

  • A trigger column (last_booking_at, last_active_at, issued_at) β€” nullable forever; NULL means "unknown, skip in scrub".
  • A proof-of-redaction column (pii_redacted_at TIMESTAMPTZ) β€” NULL means "not yet redacted". This replaces the fragile text-sentinel pattern (first_name = '[REDACTED]') that fails against real travellers whose legal names happen to match the sentinel. See ADR-028 Β§Decision #2.
sql
ALTER TABLE commerce.passengers
  ADD COLUMN IF NOT EXISTS last_booking_at   TIMESTAMPTZ,
  ADD COLUMN IF NOT EXISTS pii_redacted_at   TIMESTAMPTZ;

ALTER TABLE backoffice.resellers
  ADD COLUMN IF NOT EXISTS last_active_at    TIMESTAMPTZ,
  ADD COLUMN IF NOT EXISTS pii_redacted_at   TIMESTAMPTZ;

ALTER TABLE commerce.invoices
  ADD COLUMN IF NOT EXISTS issued_at         TIMESTAMPTZ,
  ADD COLUMN IF NOT EXISTS pii_redacted_at   TIMESTAMPTZ;

The online back-fill for last_booking_at is handled by apps/api/src/workers/pii-backfill.worker.ts (to be written) in 10 000-row chunks with a 1 s delay β€” see ADR-028 Β§Decision #4.


5. Scrub Functions ​

Every scrub function follows the same CTE-structured pipeline:

  1. candidates β€” rows where the trigger column crosses the per-entity retention window and pii_redacted_at IS NULL.
  2. held β€” subset of candidates with an open legal_holds row. These are not redacted.
  3. redact β€” UPDATE that tombstones PII columns (set to NULL or sentinel JSONB) and stamps pii_redacted_at = now().
  4. logged_redact / logged_hold β€” dual INSERT into tenant_scrub_logs so both actions (REDACTED and SKIPPED_LEGAL_HOLD) are recorded in the same transaction.

All functions are SECURITY DEFINER so they can write to tenant_scrub_logs without granting each scheduled cron role broad INSERT rights on the audit table.

5.1 scrub_passengers β€” 3-year window (BDSG Β§ 35) ​

sql
CREATE OR REPLACE FUNCTION backoffice.scrub_passengers(p_run_id UUID DEFAULT gen_random_uuid())
RETURNS INTEGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE
  v_rows INTEGER := 0;
BEGIN
  WITH candidates AS (
    SELECT p.id, p.tenant_id
      FROM commerce.passengers p
     WHERE p.pii_redacted_at IS NULL
       AND p.last_booking_at IS NOT NULL
       AND p.last_booking_at < (now() AT TIME ZONE 'UTC') - INTERVAL '3 years'
  ),
  held AS (
    SELECT c.id, c.tenant_id
      FROM candidates c
      JOIN backoffice.legal_holds h
        ON h.tenant_id = c.tenant_id
       AND h.entity_type = 'passenger'
       AND h.entity_id   = c.id
       AND h.closed_at   IS NULL
  ),
  redact AS (
    UPDATE commerce.passengers p
       SET first_name = NULL,
           last_name  = NULL,
           email      = NULL,
           phone      = NULL,
           date_of_birth   = NULL,
           document_number = NULL,
           nationality     = NULL,
           pii_redacted_at = now() AT TIME ZONE 'UTC'
      FROM candidates c
     WHERE p.id = c.id
       AND NOT EXISTS (SELECT 1 FROM held h WHERE h.id = c.id)
     RETURNING p.id, p.tenant_id
  ),
  logged_redact AS (
    INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, job_run_id)
    SELECT tenant_id, 'passenger', id, 'REDACTED', p_run_id FROM redact
    RETURNING 1
  ),
  logged_hold AS (
    INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, skip_reason, job_run_id)
    SELECT tenant_id, 'passenger', id, 'SKIPPED_LEGAL_HOLD', 'active_hold', p_run_id FROM held
    RETURNING 1
  )
  SELECT COUNT(*) INTO v_rows FROM logged_redact;

  RETURN v_rows;
END;
$$;

5.2 scrub_resellers β€” 2-year window (BDSG Β§ 35) ​

sql
CREATE OR REPLACE FUNCTION backoffice.scrub_resellers(p_run_id UUID DEFAULT gen_random_uuid())
RETURNS INTEGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE v_rows INTEGER := 0;
BEGIN
  WITH candidates AS (
    SELECT r.id, r.tenant_id
      FROM backoffice.resellers r
     WHERE r.pii_redacted_at IS NULL
       AND r.last_active_at IS NOT NULL
       AND r.last_active_at < (now() AT TIME ZONE 'UTC') - INTERVAL '2 years'
  ),
  held AS (
    SELECT c.id, c.tenant_id FROM candidates c
      JOIN backoffice.legal_holds h
        ON h.tenant_id = c.tenant_id AND h.entity_type='reseller'
       AND h.entity_id = c.id AND h.closed_at IS NULL
  ),
  redact AS (
    UPDATE backoffice.resellers r
       SET contact_name = NULL, contact_email = NULL, contact_phone = NULL,
           pii_redacted_at = now() AT TIME ZONE 'UTC'
      FROM candidates c
     WHERE r.id = c.id AND NOT EXISTS (SELECT 1 FROM held h WHERE h.id = c.id)
     RETURNING r.id, r.tenant_id
  )
  INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, job_run_id)
  SELECT tenant_id, 'reseller', id, 'REDACTED', p_run_id FROM redact;

  GET DIAGNOSTICS v_rows = ROW_COUNT;
  RETURN v_rows;
END;
$$;

5.3 scrub_invoices β€” 10-year window (GoBD Β§ 147 AO / Β§ 14b UStG) ​

Invoice PII is embedded in recipient_snapshot (JSONB). Redaction replaces the whole object with a sentinel rather than NULL-ing individual keys β€” auditors need to see that the row was processed, not that it was empty.

sql
CREATE OR REPLACE FUNCTION backoffice.scrub_invoices(p_run_id UUID DEFAULT gen_random_uuid())
RETURNS INTEGER
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
DECLARE v_rows INTEGER := 0;
BEGIN
  WITH candidates AS (
    SELECT i.id, i.tenant_id
      FROM commerce.invoices i
     WHERE i.pii_redacted_at IS NULL
       AND i.issued_at IS NOT NULL
       AND i.issued_at < (now() AT TIME ZONE 'UTC') - INTERVAL '10 years'
  ),
  held AS (
    SELECT c.id, c.tenant_id FROM candidates c
      JOIN backoffice.legal_holds h
        ON h.tenant_id=c.tenant_id AND h.entity_type='invoice'
       AND h.entity_id=c.id AND h.closed_at IS NULL
  ),
  redact AS (
    UPDATE commerce.invoices i
       SET recipient_snapshot = jsonb_build_object('redacted', true),
           pii_redacted_at = now() AT TIME ZONE 'UTC'
      FROM candidates c
     WHERE i.id = c.id AND NOT EXISTS (SELECT 1 FROM held h WHERE h.id = c.id)
     RETURNING i.id, i.tenant_id
  )
  INSERT INTO backoffice.tenant_scrub_logs (tenant_id, entity_type, entity_id, action, job_run_id)
  SELECT tenant_id, 'invoice', id, 'REDACTED', p_run_id FROM redact;

  GET DIAGNOSTICS v_rows = ROW_COUNT;
  RETURN v_rows;
END;
$$;

5.4 Conversations / communications.messages ​

Deliberately not implemented as a database scrub. Per ADR-028 Β§Decision #10, rendered names in communications.messages.rendered_content are treated as derived from commerce.passengers: once the source passenger is scrubbed, the rendered message contains an orphaned name that can no longer be re-identified. Chat transcripts in Loki log streams expire under the 14-day compactor retention (ADR-027). If the DPO rejects this position, the follow-up work is a cascading text-replace function gated on a new ADR β€” do not add it here.


6. pg_cron Schedules ​

All schedules are UTC. Off-peak for DACH; 15-minute offsets prevent three scrub functions from contending for the same buffer pool. 03:00 UTC aligns with the backup-verify cadence in backup-verify-runbook.md.

sql
SELECT cron.schedule('busflow_scrub_passengers', '0 3 * * *',  $$SELECT backoffice.scrub_passengers();$$);
SELECT cron.schedule('busflow_scrub_resellers',  '15 3 * * *', $$SELECT backoffice.scrub_resellers();$$);
SELECT cron.schedule('busflow_scrub_invoices',   '30 3 * * *', $$SELECT backoffice.scrub_invoices();$$);

The NestJS cron-health.worker.ts (ADR-028 Β§Decision #8) runs at 03:30 UTC and posts failures from cron.job_run_details to Slack #ops β€” Ubicloud does not expose cron.job_run_details to external Prometheus scrapers.


7. fillfactor Tuning ​

The scrub functions tombstone many rows per night, which churns HOT update chains. A greenfield fillfactor = 80 leaves 20 % free space per page for HOT updates, cutting index bloat. Set at migration time β€” not later β€” because ALTER TABLE ... SET (fillfactor) on a populated table doesn't retroactively rewrite pages.

sql
ALTER TABLE commerce.passengers    SET (fillfactor = 80);
ALTER TABLE backoffice.resellers   SET (fillfactor = 80);
ALTER TABLE commerce.invoices      SET (fillfactor = 80);

Down Migration ​

sql
SELECT cron.unschedule('busflow_scrub_passengers');
SELECT cron.unschedule('busflow_scrub_resellers');
SELECT cron.unschedule('busflow_scrub_invoices');

DROP FUNCTION IF EXISTS backoffice.scrub_invoices(UUID);
DROP FUNCTION IF EXISTS backoffice.scrub_resellers(UUID);
DROP FUNCTION IF EXISTS backoffice.scrub_passengers(UUID);

ALTER TABLE commerce.invoices    DROP COLUMN IF EXISTS pii_redacted_at;
ALTER TABLE commerce.invoices    DROP COLUMN IF EXISTS issued_at;
ALTER TABLE backoffice.resellers DROP COLUMN IF EXISTS pii_redacted_at;
ALTER TABLE backoffice.resellers DROP COLUMN IF EXISTS last_active_at;
ALTER TABLE commerce.passengers  DROP COLUMN IF EXISTS pii_redacted_at;
ALTER TABLE commerce.passengers  DROP COLUMN IF EXISTS last_booking_at;

DROP INDEX IF EXISTS backoffice.idx_legal_holds_open_by_entity;
DROP TABLE IF EXISTS backoffice.legal_holds;

DROP INDEX IF EXISTS backoffice.idx_scrub_logs_tenant_time;
DROP TABLE IF EXISTS backoffice.tenant_scrub_logs;

The down migration does not restore redacted PII. It exists for structural rollback during migration development only. Any real post-scrub recovery must come from a PITR restore per postgres-cutover.md Β§PITR.


When to Land This Migration ​

  1. After commerce.passengers, backoffice.resellers, and commerce.invoices exist with the columns listed in Β§Preconditions.
  2. After Legal/DPO has signed off on the retention table and the communications.messages separation-of-concerns position in ADR-028.
  3. Before the first production tenant row is written β€” the trigger columns need to be in place so pii_backfill.worker.ts can populate last_booking_at in-line rather than via a costly table rewrite later.

Internal documentation β€” Busflow