Communications Architecture β
Overview β
The Communications module is a Shared Core Domain operating as a standalone microservice, decoupled from the Backoffice, Commerce, and Operations contexts. It provides two capabilities:
- Omnichannel Inbox β a unified agent workspace for managing customer, driver, and vendor conversations across WhatsApp, Email, Webchat, and SMS.
- Trigger-Based Automated Messaging β reactive and scheduled notifications orchestrated via Hasura Event/Cron Triggers, without the domain pillars having any awareness of messaging protocols.
For domain entities (ChannelAccount, Contact, Conversation, Message) and full architectural decisions (API Aggregator model, contextual linking, nullable FK enforcement), see PRODUCT_domain-model.md.
Integration with Workflow Orchestration β
Communications plugs into the existing four-layer workflow stack (see workflow-orchestration.md):
| Layer | Communications Role |
|---|---|
| Hasura Event Triggers | Domain mutations (e.g., trip status β DELAYED) fire webhooks to the Communications service |
| Hasura Cron Triggers | Scheduled workflows (e.g., T-24h pre-trip reminders) invoke the Communications service to query upcoming events and dispatch notifications |
| n8n | Multi-channel message dispatch chains (WhatsApp + Email fallback), template rendering, delivery tracking |
| BullMQ | Batch message fan-out, retry queues for failed deliveries, rate-limited sending |
CPaaS Provider Strategy β
The Communications architecture utilizes an API Aggregator Model to keep domain logic provider-agnostic. While we choose specific providers for initial execution, the backend abstracts all external dispatch operations.
Selected Default Providers:
- WhatsApp Business: Meta Cloud API directly (avoids third-party middleman markup from Twilio/360dialog).
- Email: Amazon SES.
- SMS: Amazon SNS.
The aggregator ensures all domain events map to a generic SendMessageCommand, which only translates to Meta/AWS specific JSON payloads at the very edge of the network.
Channel Provisioning & Multi-Tenancy β
Every tenant (Operator) provisions their own channels to maintain brand identity (white-labeled messaging).
- Onboarding: Operator registers their Meta/Facebook Business account and verifies their domain for Amazon SES.
- Configuration Storage: The system saves the generated API keys, Webhook Verification Tokens, and Sender IDs in the
ChannelAccount.provider_configJSONB column. - Webhook Routing: Each tenant receives a per-WABA webhook endpoint (e.g.,
POST /api/webhooks/meta/{tenant_id}) registered via Meta's WABA-level override API (POST /<WABA_ID>/subscribed_apps). The handler resolvestenant_idfrom the URL path. SES/SNS webhooks are account-level β the handler resolvestenant_idfrom theMessagerow viaexternal_message_id. For details, see notification-pipeline-protocol.md Β§8 and channel-provisioning-protocol.md Β§7.
For registration flows (Meta Embedded Signup, SES domain verification, platform-managed SMS), ChannelAccount lifecycle state machine, Hasura Action contracts, and the operator_integrations sync contract, see channel-provisioning-protocol.md.
Message Delivery Pipeline β
The delivery pipeline operates asynchronously to ensure external network latency never blocks core domain transactions.
- Trigger: A domain event occurs (e.g.,
Incidentcreated) and triggers a Hasura Event Trigger webhook pointing to NestJS, which enriches the payload and forwards to n8n (per workflow-orchestration.md Β§Routing Rule). - Template Resolution: The n8n workflow identifies the target
passenger_profile_id, resolves the relevantNotificationTemplate, and interpolates variables. - Queueing: n8n creates a
Messagerecord withstatus: QUEUEDand pushes a genericDispatchMessageJobonto the BullMQ execution queue. - Dispatch: The
DispatchWorker(a NestJS BullMQ processor) consumes the job:- Decrypts the tenant's
provider_configsecrets viaEncryptionService. - Selects the appropriate provider adapter (WhatsApp, Email, or SMS) via
CpaasAdapterFactory. - Acquires a rate limit token from the per-tenant
RateLimiterService(tier-aware for WhatsApp). - Invokes the adapter's
dispatch()method, which constructs the provider-specific API request (Meta Cloud API, SES, or SNS), executes it, and extracts theexternal_message_idfrom the response. - Updates
Message.external_message_idand transitions status toSENT. - On permanent failure, triggers the fallback chain (if configured) by creating a new Message for the next channel and re-enqueing.
- Decrypts the tenant's
- Delivery Tracking: Meta/AWS fire delivery callbacks (Delivered, Read, Failed) to the app-level webhook endpoints. NestJS CPaaS webhook handlers validate signatures, extract the
external_message_id, query the Message row, and updatestatus,delivered_at/read_at/failed_reason. Hasura subscriptions push changes to the inbox instantly.
For the full NotificationRequest interface, DispatchMessageJob/BatchDispatchJob BullMQ specs, template resolution algorithm, multi-channel decision matrix, full trigger_event registry (22 external entries), adapter class contracts, error classification per provider, rate limiter configuration, and CPaaS webhook specification, see notification-pipeline-protocol.md.
Inbox Architecture β
The Omnichannel Inbox operates in real-time using Hasura GraphQL Subscriptions β no custom WebSocket layer required. Hasura multiplexes all subscriptions over a single WebSocket per client, handles reconnection, and integrates with the existing Nhost infrastructure.
Core subscriptions: Three Hasura subscriptions power the inbox. (1) Conversation list (sidebar, sorted by last_message_at DESC). (2) Message thread (live messages for the open conversation). (3) Global unread badge (aggregate unread-conversation count).
Agent routing: Phase 1 uses manual claim-based routing. Unassigned conversations (assigned_to = NULL) appear in a shared queue. Any dispatcher can claim a conversation via the claimConversation Hasura Action, which uses SELECT ... FOR UPDATE for concurrent-claim safety (matching the takeOverIncident pattern from Operations). Admins can reassign conversations via reassignConversation.
Unread tracking: A conversation_read_cursors table stores per-agent, per-conversation timestamp cursors. A Hasura computed field (conversation_unread_count) counts INBOUND messages with sent_at > last_read_at for the session user. Only inbound messages count as unread β outbound agent/system messages do not.
Conversation lifecycle: OPEN β RESOLVED / SNOOZED. Dispatchers trigger transitions via resolveConversation and snoozeConversation Hasura Actions. A new inbound message from the same Contact automatically reopens RESOLVED or SNOOZED conversations. A Hasura Cron Trigger (conversation_snooze_wakeup) reopens snoozed conversations past their expiry. See schema-communications.md Β§Conversation Lifecycle State Machine.
Inbound message path: CPaaS webhook (Meta Cloud API / SES / SNS) β NestJS InboundMessageHandler β Contact resolution (contacts.identifiers JSONB lookup) β Conversation resolution (match by Contact + open status) β messages INSERT (direction = INBOUND, status = DELIVERED) β Hasura subscription push. Inbound messages skip the BullMQ dispatch pipeline entirely β that pipeline handles outbound only.
Agent replies: Dispatchers send replies via the sendAgentMessage Hasura Action, which enters the same outbound pipeline as automated messages: NestJS handler creates a Message (status = QUEUED) β enqueues DispatchMessageJob on BullMQ β CPaaS delivery.
Cross-context sidebar: The inbox conversation detail panel displays booking, trip, or invoice context data via Hasura manual object relationships on the conversations table's soft FK columns (booking_id, trip_id, invoice_id). These resolve as standard SQL JOINs within the same Postgres instance (read-side coupling, permitted per domain-driven-design.md Β§7.2).
For full subscription queries, Hasura Action contracts, resolution algorithms, and computed field SQL, see inbox-protocol.md.
Observability β
To prevent silent communication failures:
- Hasura Event Logs: Track failed webhook deliveries from the database to n8n.
- BullMQ Dashboards: Expose active, delayed, and stalled queues. All failed dispatch attempts utilize an exponential backoff retry strategy.
- Status Rollups: A background cron job scans the
Messagetable for items stuck inQUEUEDfor >5 minutes, triggering high-severity alerts in the observability stack (Grafana).