Busflow Docs

Internal documentation portal

Skip to content

ADR-017: Offline Sync Protocol for Driver Hub ​

Status: 🟒 Accepted Triggered by: Level 2 Audit β€” OnboardSale & ExpenseReceipt Syncing (Findings SYNC-1, CE-1, CE-2)


Context ​

The domain model specifies "Server Wins (Delta Driven)" with IndexedDB/RxDB but provides no implementation-level protocol. The driver creates multiple entities (OnboardSale, ExpenseReceipt, BoardingEvent, Incident, IssueReport) offline, and the system syncs them when connectivity returns. The sync protocol must guarantee:

  1. No data loss on partial connectivity
  2. GoBD-compliant audit trail via change_events
  3. Idempotent replay on retry
  4. Dedup of records created both offline and server-side

Decision ​

1. Client Queue Structure ​

The Driver Hub PWA uses IndexedDB (via RxDB) to maintain a pending_mutations collection. Each mutation carries:

typescript
interface PendingMutation {
  id: string;               // UUIDv4 β€” local record ID
  entity_type: string;      // e.g., 'onboard_sale', 'expense_receipt', 'boarding_event'
  entity_id: string;        // UUIDv4 β€” the target entity's ID
  action: 'CREATE' | 'UPDATE' | 'DELETE';
  payload: Record<string, unknown>;  // The mutation data
  created_at_client: string; // ISO 8601 timestamp (client clock)
  idempotency_key: string;  // UUIDv4 β€” unique per mutation, generated client-side
}

The client appends mutations to the queue in order of creation. The queue is durable β€” surviving app restarts and browser crashes via IndexedDB persistence.

2. Sync API ​

Endpoint: POST /api/sync/batch

The client sends an ordered array of pending mutations:

typescript
// Request
interface SyncBatchRequest {
  device_id: string;          // Stable device identifier
  sync_batch_id: string;      // UUIDv4 β€” unique per sync session
  mutations: PendingMutation[];
}

// Response
interface SyncBatchResponse {
  synced: string[];           // Array of idempotency_keys successfully processed
  failed: SyncFailure[];      // Array of per-record failures
  server_state: ServerEntity[]; // Current server state for synced entities (conflict resolution)
}

interface SyncFailure {
  idempotency_key: string;
  error: string;
  retryable: boolean;
}

2a. NestJS Handler Contract ​

Module: SyncModule (apps/api/src/sync/sync.module.ts) Controller: SyncController β†’ POST /api/sync/batch

Authentication:

  • Guard: @UseGuards(NhostAuthGuard) β€” validates the Nhost JWT.
  • tenant_id and user_id extracted from JWT claims (x-hasura-tenant-id, x-hasura-user-id). Not sent in request body.
  • device_id in request body serves traceability purposes only.

Validation:

  • @UsePipes(new ValidationPipe()) with class-validator DTOs.
  • mutations[] max length: 200 per request. The client must chunk larger queues into sequential requests (each with its own sync_batch_id).
  • entity_type must be in: ['onboard_sale', 'expense_receipt', 'boarding_event', 'incident', 'issue_report'].

HTTP Status Codes:

CodeMeaning
200 OKAll mutations processed (check failed[] for per-record errors).
401 UnauthorizedJWT missing or invalid.
403 Forbiddentenant_id mismatch.
413 Payload Too Largemutations.length > 200.
422 Unprocessable EntityRequest schema validation failed.
500 Internal Server ErrorUnrecoverable server error. Client retries entire batch.

Rate Limiting: Per-device_id, max 10 requests/minute (@nestjs/throttler).

2b. Per-Entity Mutation Handlers ​

SyncService dispatches each mutation to an entity-specific handler. Template per mutation:

  1. Check sync_idempotency_log β†’ if idempotency_key exists, skip (return in synced[]).
  2. Begin transaction.
  3. For UPDATE: capture old_values + row lock via AuditTrailService.captureAndRecord() (see Β§5).
  4. Apply mutation (INSERT / UPDATE).
  5. Generate change_event row (correct FK + scope).
  6. Insert into sync_idempotency_log.
  7. Commit.
  8. Post-commit: domain events fire via Hasura Event Triggers on the underlying table mutations.

Entity-specific logic: ​

entity_typeactionScopePost-Commit Side Effects
onboard_saleCREATEGOBDIf payment_status=PAID AND sale_status=ACTIVE β†’ OnboardSaleRecorded fires. If payment_method=PAYMENT_LINK β†’ post-commit: call MolliePaymentService.createOnboardPaymentLink() β†’ populate checkout_session_id via a separate UPDATE. See schema-operations.md Β§OnboardSale. Blast radius: The Commerce API call runs post-commit (per Β§2b step 8), not inside the DB transaction β€” no connection pool pressure from slow Commerce responses. If Commerce is unavailable, the OnboardSale row is already persisted (with checkout_session_id = NULL). The mutation fails with retryable: true in failed[]; on retry, the handler detects checkout_session_id IS NULL and reattempts the Commerce call. All other mutations in the batch proceed independently per Β§3.
onboard_saleUPDATEGOBDRestricted to voiding β€” see voidOnboardSale Hasura Action. No offline UPDATE for onboard_sales.
expense_receiptCREATEGOBDHasura Event Trigger on INSERT β†’ NestJS webhook β†’ BullMQ expense-ocr-parse.
boarding_eventCREATEGENERALServer re-validates qr_hash against live manifest. May retroactively set check_in_status = INVALID β†’ BoardingConflictDetected.
incidentCREATEGENERALDedup: (service_leg_id, type, occurred_at Β± 5min). If duplicate, skip. Otherwise IncidentCreated fires. Guard: service_leg.status IN (ACTIVE, DELAYED, COMPLETED) AND (if COMPLETED: actual_end + 72h > now()).
incidentUPDATEGENERALOnly severity, description updatable offline. Status transitions beyond OPEN require connectivity.
issue_reportCREATEGENERALIssueReportCreated. If maintenance_urgency = IMMEDIATE β†’ VehicleMaintenanceRequired.
issue_reportUPDATEGENERALOnly description, attachments, category updatable.

DELETE: No entity supports client-side DELETE. Handler rejects with non-retryable error.

3. Batch Semantics β€” Per-Record Transactional ​

Each mutation in the batch runs as an independent database transaction. Partial failures do not roll back successfully synced records.

  • The server processes mutations in order, each within its own transaction.
  • The client removes successfully synced mutations from the queue immediately.
  • Failed mutations remain in the queue and retry with exponential backoff (max 5 attempts).
  • Response: { synced: [idempotency_key, ...], failed: [{ idempotency_key, error, retryable }] }.

NOTE

Why per-record, not all-or-nothing? The entities in a sync batch (OnboardSale, ExpenseReceipt, BoardingEvent, Incident, IssueReport) are independent β€” they have no transactional dependency on each other. A validation error on one expense receipt should not block 20 perfectly valid boarding events and cash sales from syncing. Per-record semantics also minimize the GoBD time window where financial data exists only on the untrusted client. Batch size bounds the complexity cost: the client tracks synced[] vs failed[] from the response, which RxDB's replication plugin already supports.

4. Deduplication ​

The server checks idempotency_key uniqueness against a dedicated sync_idempotency_log table:

sql
CREATE TABLE operations.sync_idempotency_log (
  idempotency_key UUID PRIMARY KEY,
  processed_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
  entity_type     VARCHAR NOT NULL,
  entity_id       UUID NOT NULL
);
  • Duplicate idempotency_key β†’ skip silently, return success.
  • Do not use a UNIQUE constraint on change_events.client_event_id β€” a single client action can generate multiple change_events across different entities, which would crash the second insert.

5. Change Event Generation ​

The server generates change_events within the same transaction as the entity mutation, using the shared AuditTrailService (ADR-019):

  1. The client sends raw mutations (never change_events).
  2. For UPDATE: call AuditTrailService.captureAndRecord(tx, ...) β€” acquires row lock via SELECT ... FOR UPDATE, captures pre-mutation state as old_values. Post-mutation state β†’ new_values.
  3. For CREATE: call AuditTrailService.record(tx, { oldValues: null, ... }) β€” new_values = full row.
  4. Server sets: created_at = now(), client_event_id = idempotency_key, device_id, sync_batch_id, user_id + tenant_id from JWT.
  5. FK mapping: entity_type β†’ polymorphic reference (entity_type + entity_id) per ADR-019.
  6. Scope mapping: onboard_sale/expense_receipt β†’ GOBD; boarding_event/incident/issue_report β†’ GENERAL.
  7. Idempotency log INSERT executes inside the same transaction β€” crash-safe: either both the entity mutation and the log entry commit, or neither does.
  8. The client never generates change_events directly β€” this preserves the trust boundary for GoBD.

6. Conflict Resolution ​

Strategy: "Server Wins"

  • If the server already has a newer version of a record (determined by updated_at comparison), the server version persists.
  • The client receives the server state in the server_state field of the sync response.
  • The client overwrites its local copy with the server state.
  • No merge logic β€” the server version is authoritative.

7. Connectivity Detection ​

  1. Service Worker online/offline event listeners β€” primary mechanism.
  2. Periodic background sync β€” via PWA Background Sync API (where supported by browser).
  3. Manual "Force Sync" button β€” fallback for unreliable connectivity detection (e.g., captive portals, degraded mobile signal).

Sync triggers on online event + periodic background sync. The UI displays a sync status indicator showing queued mutation count and last successful sync timestamp.

8. Retry Strategy ​

  • Exponential backoff: 1s, 2s, 4s, 8s, 16s, max 30s.
  • Max retries: 5 per mutation.
  • Jitter: Random Β±20% to prevent thundering herd on reconnect.
  • After max retries exhausted, the mutation remains in the local queue and surfaces a user-facing error notification with a "Retry" button.

9. Edge States ​

E1: Partial Response / Connection Drop Mid-Sync ​

Client does not know which mutations succeeded. On reconnect, retries entire batch with same sync_batch_id and idempotency_key values. Server's dedup ensures already-processed mutations skip silently β†’ returned in synced[]. Safe replay.

E3: Large Batch (100+ Records) ​

Client-side chunking: if queue > 200 mutations, split into sequential batches of ≀200 (each with unique sync_batch_id). Sent sequentially to preserve causal ordering. Server rejects > 200 with HTTP 413. Server timeout: 30s per batch.

E4: Clock Skew ​

created_at_client is diagnostic only β€” never used for conflict resolution. change_events.created_at = server now() (GoBD authoritative). Client clock > 1 hour off β†’ warning toast.

E5: Concurrent Server-Side Mutation ​

SELECT ... FOR UPDATE acquires row lock. If mutation.created_at_client < row.updated_at (server version newer), mutation skipped β†’ failed[] with CONFLICT_SERVER_WINS (non-retryable). Server state included in server_state[].

E6: IndexedDB Quota Exceeded ​

On QuotaExceededError: persistent banner "Storage full β€” sync before creating new records." New mutations blocked. Existing queue preserved. Immediate sync attempted. Proactive: check navigator.storage.estimate() at 80% β†’ warning toast.

E7: Service Worker Registration Failure ​

Fallback: navigator.onLine polling every 30s. Background Sync unavailable. Manual "Force Sync" button remains. Info banner: "Limited offline support β€” use Sync button." Core ops unaffected.


Consequences ​

  • New table: sync_idempotency_log in the operations schema.
  • New API endpoint: POST /api/sync/batch in NestJS API.
  • change_events gains: client_event_id (non-unique), device_id, and sync_batch_id columns.
  • RxDB replication plugin handles client-side queue management; custom sync endpoint replaces the default CouchDB replication.
  • This protocol applies to all offline-capable entities, not just OnboardSale/ExpenseReceipt. It is a cross-cutting concern for the entire Driver Hub.

Applies To ​

EntityOffline CREATEOffline UPDATENotes
OnboardSaleβœ… (cash only)❌Void/refund require connectivity (Hasura Actions). PAYMENT_LINK creation requires connectivity.
ExpenseReceiptβœ…βŒCreated offline, OCR processing is server-side
BoardingEventβœ…βŒQR scan results stored locally
Incidentβœ…βœ…Driver can update severity/notes offline
IssueReportβœ…βœ…Crew can add photos/details offline

Internal documentation β€” Busflow