ADR-017: Offline Sync Protocol for Driver Hub β
Status: π’ Accepted Triggered by: Level 2 Audit β OnboardSale & ExpenseReceipt Syncing (Findings SYNC-1, CE-1, CE-2)
Context β
The domain model specifies "Server Wins (Delta Driven)" with IndexedDB/RxDB but provides no implementation-level protocol. The driver creates multiple entities (OnboardSale, ExpenseReceipt, BoardingEvent, Incident, IssueReport) offline, and the system syncs them when connectivity returns. The sync protocol must guarantee:
- No data loss on partial connectivity
- GoBD-compliant audit trail via
change_events - Idempotent replay on retry
- Dedup of records created both offline and server-side
Decision β
1. Client Queue Structure β
The Driver Hub PWA uses IndexedDB (via RxDB) to maintain a pending_mutations collection. Each mutation carries:
interface PendingMutation {
id: string; // UUIDv4 β local record ID
entity_type: string; // e.g., 'onboard_sale', 'expense_receipt', 'boarding_event'
entity_id: string; // UUIDv4 β the target entity's ID
action: 'CREATE' | 'UPDATE' | 'DELETE';
payload: Record<string, unknown>; // The mutation data
created_at_client: string; // ISO 8601 timestamp (client clock)
idempotency_key: string; // UUIDv4 β unique per mutation, generated client-side
}The client appends mutations to the queue in order of creation. The queue is durable β surviving app restarts and browser crashes via IndexedDB persistence.
2. Sync API β
Endpoint: POST /api/sync/batch
The client sends an ordered array of pending mutations:
// Request
interface SyncBatchRequest {
device_id: string; // Stable device identifier
sync_batch_id: string; // UUIDv4 β unique per sync session
mutations: PendingMutation[];
}
// Response
interface SyncBatchResponse {
synced: string[]; // Array of idempotency_keys successfully processed
failed: SyncFailure[]; // Array of per-record failures
server_state: ServerEntity[]; // Current server state for synced entities (conflict resolution)
}
interface SyncFailure {
idempotency_key: string;
error: string;
retryable: boolean;
}2a. NestJS Handler Contract β
Module: SyncModule (apps/api/src/sync/sync.module.ts) Controller: SyncController β POST /api/sync/batch
Authentication:
- Guard:
@UseGuards(NhostAuthGuard)β validates the Nhost JWT. tenant_idanduser_idextracted from JWT claims (x-hasura-tenant-id,x-hasura-user-id). Not sent in request body.device_idin request body serves traceability purposes only.
Validation:
@UsePipes(new ValidationPipe())with class-validator DTOs.mutations[]max length: 200 per request. The client must chunk larger queues into sequential requests (each with its ownsync_batch_id).entity_typemust be in:['onboard_sale', 'expense_receipt', 'boarding_event', 'incident', 'issue_report'].
HTTP Status Codes:
| Code | Meaning |
|---|---|
200 OK | All mutations processed (check failed[] for per-record errors). |
401 Unauthorized | JWT missing or invalid. |
403 Forbidden | tenant_id mismatch. |
413 Payload Too Large | mutations.length > 200. |
422 Unprocessable Entity | Request schema validation failed. |
500 Internal Server Error | Unrecoverable server error. Client retries entire batch. |
Rate Limiting: Per-device_id, max 10 requests/minute (@nestjs/throttler).
2b. Per-Entity Mutation Handlers β
SyncService dispatches each mutation to an entity-specific handler. Template per mutation:
- Check
sync_idempotency_logβ ifidempotency_keyexists, skip (return insynced[]). - Begin transaction.
- For UPDATE: capture
old_values+ row lock viaAuditTrailService.captureAndRecord()(see Β§5). - Apply mutation (INSERT / UPDATE).
- Generate
change_eventrow (correct FK + scope). - Insert into
sync_idempotency_log. - Commit.
- Post-commit: domain events fire via Hasura Event Triggers on the underlying table mutations.
Entity-specific logic: β
| entity_type | action | Scope | Post-Commit Side Effects |
|---|---|---|---|
onboard_sale | CREATE | GOBD | If payment_status=PAID AND sale_status=ACTIVE β OnboardSaleRecorded fires. If payment_method=PAYMENT_LINK β post-commit: call MolliePaymentService.createOnboardPaymentLink() β populate checkout_session_id via a separate UPDATE. See schema-operations.md Β§OnboardSale. Blast radius: The Commerce API call runs post-commit (per Β§2b step 8), not inside the DB transaction β no connection pool pressure from slow Commerce responses. If Commerce is unavailable, the OnboardSale row is already persisted (with checkout_session_id = NULL). The mutation fails with retryable: true in failed[]; on retry, the handler detects checkout_session_id IS NULL and reattempts the Commerce call. All other mutations in the batch proceed independently per Β§3. |
onboard_sale | UPDATE | GOBD | Restricted to voiding β see voidOnboardSale Hasura Action. No offline UPDATE for onboard_sales. |
expense_receipt | CREATE | GOBD | Hasura Event Trigger on INSERT β NestJS webhook β BullMQ expense-ocr-parse. |
boarding_event | CREATE | GENERAL | Server re-validates qr_hash against live manifest. May retroactively set check_in_status = INVALID β BoardingConflictDetected. |
incident | CREATE | GENERAL | Dedup: (service_leg_id, type, occurred_at Β± 5min). If duplicate, skip. Otherwise IncidentCreated fires. Guard: service_leg.status IN (ACTIVE, DELAYED, COMPLETED) AND (if COMPLETED: actual_end + 72h > now()). |
incident | UPDATE | GENERAL | Only severity, description updatable offline. Status transitions beyond OPEN require connectivity. |
issue_report | CREATE | GENERAL | IssueReportCreated. If maintenance_urgency = IMMEDIATE β VehicleMaintenanceRequired. |
issue_report | UPDATE | GENERAL | Only description, attachments, category updatable. |
DELETE: No entity supports client-side DELETE. Handler rejects with non-retryable error.
3. Batch Semantics β Per-Record Transactional β
Each mutation in the batch runs as an independent database transaction. Partial failures do not roll back successfully synced records.
- The server processes mutations in order, each within its own transaction.
- The client removes successfully synced mutations from the queue immediately.
- Failed mutations remain in the queue and retry with exponential backoff (max 5 attempts).
- Response:
{ synced: [idempotency_key, ...], failed: [{ idempotency_key, error, retryable }] }.
NOTE
Why per-record, not all-or-nothing? The entities in a sync batch (OnboardSale, ExpenseReceipt, BoardingEvent, Incident, IssueReport) are independent β they have no transactional dependency on each other. A validation error on one expense receipt should not block 20 perfectly valid boarding events and cash sales from syncing. Per-record semantics also minimize the GoBD time window where financial data exists only on the untrusted client. Batch size bounds the complexity cost: the client tracks synced[] vs failed[] from the response, which RxDB's replication plugin already supports.
4. Deduplication β
The server checks idempotency_key uniqueness against a dedicated sync_idempotency_log table:
CREATE TABLE operations.sync_idempotency_log (
idempotency_key UUID PRIMARY KEY,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
entity_type VARCHAR NOT NULL,
entity_id UUID NOT NULL
);- Duplicate
idempotency_keyβ skip silently, return success. - Do not use a UNIQUE constraint on
change_events.client_event_idβ a single client action can generate multiplechange_eventsacross different entities, which would crash the second insert.
5. Change Event Generation β
The server generates change_events within the same transaction as the entity mutation, using the shared AuditTrailService (ADR-019):
- The client sends raw mutations (never
change_events). - For UPDATE: call
AuditTrailService.captureAndRecord(tx, ...)β acquires row lock viaSELECT ... FOR UPDATE, captures pre-mutation state asold_values. Post-mutation state βnew_values. - For CREATE: call
AuditTrailService.record(tx, { oldValues: null, ... })βnew_values = full row. - Server sets:
created_at = now(),client_event_id = idempotency_key,device_id,sync_batch_id,user_id+tenant_idfrom JWT. - FK mapping:
entity_typeβ polymorphic reference (entity_type+entity_id) per ADR-019. - Scope mapping:
onboard_sale/expense_receiptβGOBD;boarding_event/incident/issue_reportβGENERAL. - Idempotency log INSERT executes inside the same transaction β crash-safe: either both the entity mutation and the log entry commit, or neither does.
- The client never generates
change_eventsdirectly β this preserves the trust boundary for GoBD.
6. Conflict Resolution β
Strategy: "Server Wins"
- If the server already has a newer version of a record (determined by
updated_atcomparison), the server version persists. - The client receives the server state in the
server_statefield of the sync response. - The client overwrites its local copy with the server state.
- No merge logic β the server version is authoritative.
7. Connectivity Detection β
- Service Worker
online/offlineevent listeners β primary mechanism. - Periodic background sync β via PWA Background Sync API (where supported by browser).
- Manual "Force Sync" button β fallback for unreliable connectivity detection (e.g., captive portals, degraded mobile signal).
Sync triggers on online event + periodic background sync. The UI displays a sync status indicator showing queued mutation count and last successful sync timestamp.
8. Retry Strategy β
- Exponential backoff: 1s, 2s, 4s, 8s, 16s, max 30s.
- Max retries: 5 per mutation.
- Jitter: Random Β±20% to prevent thundering herd on reconnect.
- After max retries exhausted, the mutation remains in the local queue and surfaces a user-facing error notification with a "Retry" button.
9. Edge States β
E1: Partial Response / Connection Drop Mid-Sync β
Client does not know which mutations succeeded. On reconnect, retries entire batch with same sync_batch_id and idempotency_key values. Server's dedup ensures already-processed mutations skip silently β returned in synced[]. Safe replay.
E3: Large Batch (100+ Records) β
Client-side chunking: if queue > 200 mutations, split into sequential batches of β€200 (each with unique sync_batch_id). Sent sequentially to preserve causal ordering. Server rejects > 200 with HTTP 413. Server timeout: 30s per batch.
E4: Clock Skew β
created_at_client is diagnostic only β never used for conflict resolution. change_events.created_at = server now() (GoBD authoritative). Client clock > 1 hour off β warning toast.
E5: Concurrent Server-Side Mutation β
SELECT ... FOR UPDATE acquires row lock. If mutation.created_at_client < row.updated_at (server version newer), mutation skipped β failed[] with CONFLICT_SERVER_WINS (non-retryable). Server state included in server_state[].
E6: IndexedDB Quota Exceeded β
On QuotaExceededError: persistent banner "Storage full β sync before creating new records." New mutations blocked. Existing queue preserved. Immediate sync attempted. Proactive: check navigator.storage.estimate() at 80% β warning toast.
E7: Service Worker Registration Failure β
Fallback: navigator.onLine polling every 30s. Background Sync unavailable. Manual "Force Sync" button remains. Info banner: "Limited offline support β use Sync button." Core ops unaffected.
Consequences β
- New table:
sync_idempotency_login theoperationsschema. - New API endpoint:
POST /api/sync/batchin NestJS API. change_eventsgains:client_event_id(non-unique),device_id, andsync_batch_idcolumns.- RxDB replication plugin handles client-side queue management; custom sync endpoint replaces the default CouchDB replication.
- This protocol applies to all offline-capable entities, not just OnboardSale/ExpenseReceipt. It is a cross-cutting concern for the entire Driver Hub.
Applies To β
| Entity | Offline CREATE | Offline UPDATE | Notes |
|---|---|---|---|
OnboardSale | β (cash only) | β | Void/refund require connectivity (Hasura Actions). PAYMENT_LINK creation requires connectivity. |
ExpenseReceipt | β | β | Created offline, OCR processing is server-side |
BoardingEvent | β | β | QR scan results stored locally |
Incident | β | β | Driver can update severity/notes offline |
IssueReport | β | β | Crew can add photos/details offline |