ADR-017: Offline Sync Protocol for Driver Hub
Status: 🟢 Accepted Triggered by: Level 2 Audit — OnboardSale & ExpenseReceipt Syncing (Findings SYNC-1, CE-1, CE-2)
Context
The domain model specifies "Server Wins (Delta Driven)" with IndexedDB/RxDB but provides no implementation-level protocol. The driver creates multiple entities (OnboardSale, ExpenseReceipt, BoardingEvent, Incident, IssueReport) offline, and the system syncs them when connectivity returns. The sync protocol must guarantee:
- No data loss on partial connectivity
- GoBD-compliant audit trail via
change_events - Idempotent replay on retry
- Dedup of records created both offline and server-side
Decision
1. Client Queue Structure
The Driver Hub PWA uses IndexedDB (via RxDB) to maintain a pending_mutations collection. Each mutation carries:
interface PendingMutation {
id: string; // UUIDv4 — local record ID
entity_type: string; // e.g., 'onboard_sale', 'expense_receipt', 'boarding_event'
entity_id: string; // UUIDv4 — the target entity's ID
action: 'CREATE' | 'UPDATE' | 'DELETE';
payload: Record<string, unknown>; // The mutation data
created_at_client: string; // ISO 8601 timestamp (client clock)
idempotency_key: string; // UUIDv4 — unique per mutation, generated client-side
}The client appends mutations to the queue in order of creation. The queue is durable — surviving app restarts and browser crashes via IndexedDB persistence.
2. Sync API
Endpoint: POST /api/sync/batch
The client sends an ordered array of pending mutations:
// Request
interface SyncBatchRequest {
device_id: string; // Stable device identifier
sync_batch_id: string; // UUIDv4 — unique per sync session
mutations: PendingMutation[];
}
// Response
interface SyncBatchResponse {
synced: string[]; // Array of idempotency_keys successfully processed
failed: SyncFailure[]; // Array of per-record failures
server_state: ServerEntity[]; // Current server state for synced entities (conflict resolution)
}
interface SyncFailure {
idempotency_key: string;
error: string;
retryable: boolean;
}2a. NestJS Handler Contract
Module: SyncModule (apps/api/src/sync/sync.module.ts) Controller: SyncController → POST /api/sync/batch
Authentication:
- Guard:
@UseGuards(NhostAuthGuard)— validates the Nhost JWT. tenant_idanduser_idextracted from JWT claims (x-hasura-tenant-id,x-hasura-user-id). Not sent in request body.device_idin request body serves traceability purposes only.
Validation:
@UsePipes(new ValidationPipe())with class-validator DTOs.mutations[]max length: 200 per request. The client must chunk larger queues into sequential requests (each with its ownsync_batch_id).entity_typemust be in:['onboard_sale', 'expense_receipt', 'boarding_event', 'incident', 'issue_report'].
HTTP Status Codes:
| Code | Meaning |
|---|---|
200 OK | All mutations processed (check failed[] for per-record errors). |
401 Unauthorized | JWT missing or invalid. |
403 Forbidden | tenant_id mismatch. |
413 Payload Too Large | mutations.length > 200. |
422 Unprocessable Entity | Request schema validation failed. |
500 Internal Server Error | Unrecoverable server error. Client retries entire batch. |
Rate Limiting: Per-device_id, max 10 requests/minute (@nestjs/throttler).
2b. Per-Entity Mutation Handlers
SyncService dispatches each mutation to an entity-specific handler. Template per mutation:
- Check
sync_idempotency_log→ ifidempotency_keyexists, skip (return insynced[]). - Begin transaction.
- For UPDATE: capture
old_values+ row lock viaAuditTrailService.captureAndRecord()(see §5). - Apply mutation (INSERT / UPDATE).
- Generate
change_eventrow (correct FK + scope). - Insert into
sync_idempotency_log. - Commit.
- Post-commit: domain events fire via Hasura Event Triggers on the underlying table mutations.
Entity-specific logic:
| entity_type | action | Scope | Post-Commit Side Effects |
|---|---|---|---|
onboard_sale | CREATE | GOBD | If payment_status=PAID AND sale_status=ACTIVE → OnboardSaleRecorded fires. If payment_method=PAYMENT_LINK → post-commit: call MolliePaymentService.createOnboardPaymentLink() → populate checkout_session_id via a separate UPDATE. See schema-operations.md §OnboardSale. Blast radius: The Commerce API call runs post-commit (per §2b step 8), not inside the DB transaction — no connection pool pressure from slow Commerce responses. If Commerce is unavailable, the OnboardSale row is already persisted (with checkout_session_id = NULL). The mutation fails with retryable: true in failed[]; on retry, the handler detects checkout_session_id IS NULL and reattempts the Commerce call. All other mutations in the batch proceed independently per §3. |
onboard_sale | UPDATE | GOBD | Restricted to voiding — see voidOnboardSale Hasura Action. No offline UPDATE for onboard_sales. |
expense_receipt | CREATE | GOBD | Hasura Event Trigger on INSERT → NestJS webhook → BullMQ expense-ocr-parse. |
boarding_event | CREATE | GENERAL | Server re-validates qr_hash against live manifest. May retroactively set check_in_status = INVALID → BoardingConflictDetected. |
incident | CREATE | GENERAL | Dedup: (service_leg_id, type, occurred_at ± 5min). If duplicate, skip. Otherwise IncidentCreated fires. Guard: service_leg.status IN (ACTIVE, DELAYED, COMPLETED) AND (if COMPLETED: actual_end + 72h > now()). |
incident | UPDATE | GENERAL | Only severity, description updatable offline. Status transitions beyond OPEN require connectivity. |
issue_report | CREATE | GENERAL | IssueReportCreated. If maintenance_urgency = IMMEDIATE → VehicleMaintenanceRequired. |
issue_report | UPDATE | GENERAL | Only description, attachments, category updatable. |
DELETE: No entity supports client-side DELETE. Handler rejects with non-retryable error.
3. Batch Semantics — Per-Record Transactional
Each mutation in the batch runs as an independent database transaction. Partial failures do not roll back successfully synced records.
- The server processes mutations in order, each within its own transaction.
- The client removes successfully synced mutations from the queue immediately.
- Failed mutations remain in the queue and retry with exponential backoff (max 5 attempts).
- Response:
{ synced: [idempotency_key, ...], failed: [{ idempotency_key, error, retryable }] }.
NOTE
Why per-record, not all-or-nothing? The entities in a sync batch (OnboardSale, ExpenseReceipt, BoardingEvent, Incident, IssueReport) are independent — they have no transactional dependency on each other. A validation error on one expense receipt should not block 20 perfectly valid boarding events and cash sales from syncing. Per-record semantics also minimize the GoBD time window where financial data exists only on the untrusted client. Batch size bounds the complexity cost: the client tracks synced[] vs failed[] from the response, which RxDB's replication plugin already supports.
4. Deduplication
The server checks idempotency_key uniqueness against a dedicated sync_idempotency_log table:
CREATE TABLE operations.sync_idempotency_log (
idempotency_key UUID PRIMARY KEY,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
entity_type VARCHAR NOT NULL,
entity_id UUID NOT NULL
);- Duplicate
idempotency_key→ skip silently, return success. - Do not use a UNIQUE constraint on
change_events.client_event_id— a single client action can generate multiplechange_eventsacross different entities, which would crash the second insert.
5. Change Event Generation
The server generates change_events within the same transaction as the entity mutation, using the shared AuditTrailService (ADR-019):
- The client sends raw mutations (never
change_events). - For UPDATE: call
AuditTrailService.captureAndRecord(tx, ...)— acquires row lock viaSELECT ... FOR UPDATE, captures pre-mutation state asold_values. Post-mutation state →new_values. - For CREATE: call
AuditTrailService.record(tx, { oldValues: null, ... })—new_values = full row. - Server sets:
created_at = now(),client_event_id = idempotency_key,device_id,sync_batch_id,user_id+tenant_idfrom JWT. - FK mapping:
entity_type→ polymorphic reference (entity_type+entity_id) per ADR-019. - Scope mapping:
onboard_sale/expense_receipt→GOBD;boarding_event/incident/issue_report→GENERAL. - Idempotency log INSERT executes inside the same transaction — crash-safe: either both the entity mutation and the log entry commit, or neither does.
- The client never generates
change_eventsdirectly — this preserves the trust boundary for GoBD.
6. Conflict Resolution
Strategy: "Server Wins"
- If the server already has a newer version of a record (determined by
updated_atcomparison), the server version persists. - The client receives the server state in the
server_statefield of the sync response. - The client overwrites its local copy with the server state.
- No merge logic — the server version is authoritative.
7. Connectivity Detection
- Service Worker
online/offlineevent listeners — primary mechanism. - Periodic background sync — via PWA Background Sync API (where supported by browser).
- Manual "Force Sync" button — fallback for unreliable connectivity detection (e.g., captive portals, degraded mobile signal).
Sync triggers on online event + periodic background sync. The UI displays a sync status indicator showing queued mutation count and last successful sync timestamp.
8. Retry Strategy
- Exponential backoff: 1s, 2s, 4s, 8s, 16s, max 30s.
- Max retries: 5 per mutation.
- Jitter: Random ±20% to prevent thundering herd on reconnect.
- After max retries exhausted, the mutation remains in the local queue and surfaces a user-facing error notification with a "Retry" button.
9. Edge States
E1: Partial Response / Connection Drop Mid-Sync
Client does not know which mutations succeeded. On reconnect, retries entire batch with same sync_batch_id and idempotency_key values. Server's dedup ensures already-processed mutations skip silently → returned in synced[]. Safe replay.
E3: Large Batch (100+ Records)
Client-side chunking: if queue > 200 mutations, split into sequential batches of ≤200 (each with unique sync_batch_id). Sent sequentially to preserve causal ordering. Server rejects > 200 with HTTP 413. Server timeout: 30s per batch.
E4: Clock Skew
created_at_client is diagnostic only — never used for conflict resolution. change_events.created_at = server now() (GoBD authoritative). Client clock > 1 hour off → warning toast.
E5: Concurrent Server-Side Mutation
SELECT ... FOR UPDATE acquires row lock. If mutation.created_at_client < row.updated_at (server version newer), mutation skipped → failed[] with CONFLICT_SERVER_WINS (non-retryable). Server state included in server_state[].
E6: IndexedDB Quota Exceeded
On QuotaExceededError: persistent banner "Storage full — sync before creating new records." New mutations blocked. Existing queue preserved. Immediate sync attempted. Proactive: check navigator.storage.estimate() at 80% → warning toast.
E7: Service Worker Registration Failure
Fallback: navigator.onLine polling every 30s. Background Sync unavailable. Manual "Force Sync" button remains. Info banner: "Limited offline support — use Sync button." Core ops unaffected.
Consequences
- New table:
sync_idempotency_login theoperationsschema. - New API endpoint:
POST /api/sync/batchin NestJS API. change_eventsgains:client_event_id(non-unique),device_id, andsync_batch_idcolumns.- RxDB replication plugin handles client-side queue management; custom sync endpoint replaces the default CouchDB replication.
- This protocol applies to all offline-capable entities, not just OnboardSale/ExpenseReceipt. It is a cross-cutting concern for the entire Driver Hub.
Applies To
| Entity | Offline CREATE | Offline UPDATE | Notes |
|---|---|---|---|
OnboardSale | ✅ (cash only) | ❌ | Void/refund require connectivity (Hasura Actions). PAYMENT_LINK creation requires connectivity. |
ExpenseReceipt | ✅ | ❌ | Created offline, OCR processing is server-side |
BoardingEvent | ✅ | ❌ | QR scan results stored locally |
Incident | ✅ | ✅ | Driver can update severity/notes offline |
IssueReport | ✅ | ✅ | Crew can add photos/details offline |