Architecture Guidelines: Modular Monolith & Domain-Driven Design โ
1. Domain & Bounded Contexts โ
Our system operates within the Bus Tourism domain (with future expansion planned for broader Mobility Services). The architecture is a Modular Monolith utilizing NestJS, Hasura (GraphQL), and PostgreSQL.
We define four primary Bounded Contexts (Pillars), strictly segregated to prevent tightly coupled code:
- Backoffice Context: Acts as the master record for business configuration, operational staff, abstract product definitions, and financial planning.
- Commerce Context: Handles ticketing, sales, B2C/B2B conversions, capacity holds, and accounting/tax actuals.
- Operations Context: Manages real-world execution, dispatching, fleet telemetry, incident reporting, and driver logistics (formerly "Driver Context").
- Communications Context: A shared core domain providing omnichannel inbox capabilities and automated messaging to all other contexts.
- Customer Intelligence Context:
[future โ Phase 3]An event-sourced analytics domain that consumes activity signals from all four operational contexts and produces behavioral aggregates, customer segmentation, and personalized recommendations. Enables the 360ยฐ Customer Profile vision. See ADR-021.
2. Database Boundary Enforcement โ
To maintain strict context boundaries within a shared PostgreSQL database exposed via Hasura:
- PostgreSQL Schemas: Isolate tables into context-specific schemas (e.g.,
commerce.tour_offerings,operations.service_legs,backoffice.operators). Do not use thepublicschema for domain entities. - No Cross-Schema Foreign Keys: Bounded contexts must not enforce foreign keys against each other.
- Reference by ID: Aggregate roots in one context must only store the ID (UUID/String) of entities in another context (e.g.,
commerce.tour_offeringsstorestour_departure_idas a plain UUID column โ no FK constraint tobackoffice.tour_departures). - Read-Only SQL Views: When a context requires data owned by another context, utilize schema-bound SQL Views (e.g., the dispatch availability view joining
backoffice.crew_memberswithoperations.leg_assignmentsโ see ยง9.2).
2.1 Multi-Tenant Data Isolation โ
Every domain table carries tenant_id UUID NOT NULL referencing backoffice.operators. Two layers enforce isolation. See tenant-isolation-strategy ADR.
Primary: Hasura Permission Rules. Every table's select, insert, update, delete permissions include a filter matching the JWT claim:
# Example: backoffice.tour_templates โ role: dispatcher
select_permissions:
filter:
tenant_id: { _eq: "x-hasura-tenant-id" }
columns: [id, tenant_id, title, status, ...]Secondary: Postgres RLS (defense-in-depth). Each tenant-scoped table has a Row Level Security policy as a safeguard against Hasura Action bypasses or direct SQL access:
ALTER TABLE backoffice.tour_templates ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON backoffice.tour_templates
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);NestJS sets app.current_tenant_id via SET LOCAL on each database connection.
Global reference tables (e.g., countries, currencies, vehicle_types) are exempt โ they carry no tenant_id and have no RLS policy.
Busflow Staff use the elevated Hasura role busflow_staff with unrestricted select permissions (no tenant_id filter). The system uses this role exclusively for cross-tenant analytics, support, and tenant provisioning.
2.2 Physical Schema Index โ
Detailed physical schemas, Entity-Relationship Diagrams (ERDs), and table definitions reside individually per pillar:
- Backoffice Schema (
schema-backoffice.md):- Scope: Configuration, operational staff, abstract product definitions (
TourTemplate), concrete scheduled departures (TourDeparture), third-party inventory (Allotment), CRM, and financial planning (CostingSheet).
- Scope: Configuration, operational staff, abstract product definitions (
- Commerce & Finance Schema (
schema-commerce.md):- Scope: Conversion and accounting engine, handling
TourOfferings, B2C/B2BBookings,Payments, ticketing, and actual margin taxation (FinancialLedger).
- Scope: Conversion and accounting engine, handling
- Operations Schema (
schema-operations.md):- Scope: Execution layer managing
ServiceLegs, dispatching (LegAssignment), IoT fleet telemetry, OCR expense scanning, and offline app sync.
- Scope: Execution layer managing
- Communications Schema (
schema-communications.md):- Scope: Shared Core Domain providing omnichannel inbox capabilities (
Conversations,Messages) and trigger-based automated messaging.
- Scope: Shared Core Domain providing omnichannel inbox capabilities (
2.3 Aggregate-Level FK Policy โ
Hard foreign key constraints follow aggregate boundaries. See ADR-036 for the full decision rationale.
| Relationship | FK Type | Rationale |
|---|---|---|
| Intra-aggregate (root โ child) | Hard FK | Shared transactional boundary. DB-level integrity enforces containment. |
| Cross-aggregate, lifecycle conflict | Soft ID reference | Independent status machines, unbounded collections, or snapshot-based independence. |
| Cross-aggregate, stable context reference | Hard FK | Target entity never deleted (status lifecycle only). No cascade risk. |
| Cross-schema | Soft ID reference | Existing ยง2 rule. |
| Self-referential version chain | Hard FK | Intra-aggregate version linking (e.g., price_matrices.superseded_by). |
A cross-aggregate reference has a lifecycle conflict when:
- The aggregates transition through independent status machines
- The child collection is unbounded
- An ADR explicitly mandates independent lifecycles
- The referencing entity snapshots data at creation (no live reference needed)
- Archival of the referenced entity should not cascade
For aggregate definitions per context, see the spoke files in domain-model.md.
3. Primary Database System โ
Intent & Business Context โ
The primary database must deliver enterprise-grade stability for transactions, high schema flexibility for variable travel itineraries, and a native foundation for AI capabilities (Magic Upload, Copilot).
- Data Taxonomy:
- Transactional: Bookings, payments, passenger lists, fleet inventory (Requires strict ACID compliance).
- Variable: Itineraries, AI parser outputs (Requires semi-structured schema support).
- Network: Geographical routes, stop topologies.
- Semantic: Vector arrays for AI similarity mapping.
Decision: PostgreSQL โ
We selected PostgreSQL as the unified primary data store. This choice consolidates relational, document, and vector databases into a single sovereign system, aligning with the "Do More With Less" operational pillar.
High-Level Storage Strategy โ
- Core Relational Storage:
- Intent: Enforce absolute data integrity for high-stakes financial and operational truth.
- Target Entities:
Users,Operators,Bookings,Payments,Vehicles.
- Variable Document Storage (
JSONB):- Intent: Prevent schema bloat (empty columns/complex joins) while accommodating inherently unpredictable trip structures and dynamic multi-tenant configurations.
- Target Entities:
CostingSheet(Price Matrices, Cost Components),Vehicle(Seat Map Layouts),Itineraries.
- AI & Semantic Storage (
pgvector):- Intent: Keep AI context immediately adjacent to operational data for zero-latency retrieval.
- Target Entities: Text embeddings generated from AI-assisted PDF pipelines (
TourTemplate); Copilot conversational histories.
Rejected Alternatives โ
- MongoDB:
- Reasoning: Discarded due to insufficient strict relational constraints. Unsuitable as a primary source of truth for seat maps, digital tickets, and financial ledgers.
- Neo4j:
- Reasoning: Discarded as overly complex for primary CRUD operations (Apple Pay processing, user profiles).
- Future Consideration: We may adopt it later as an isolated microservice specifically handling complex geographical routing and fleet optimization.
4. Validation & Type Safety โ
High-Level Strategy: Defense in Depth
The Busflow monorepo divides validation and type-safety responsibilities across three distinct layers. This approach prevents duplication of effort while ensuring robust runtime and compile-time safety across frontend, AI workers, and the database.
4.1 PostgreSQL (Data Integrity) โ
PostgreSQL is the absolute source of truth for structural data integrity and persistence.
- Responsibility: Facts that never change and safeguards for relational integrity.
- Rules Enforced: Data types (
INTEGER,TEXT), nullability (NOT NULL), relationships (Foreign Keys), uniqueness (UNIQUE), and absolute baseline constraints (e.g.,price >= 0). See also ยง5 for the CQRS decision framework on when to place constraints here vs. Hasura vs. NestJS. - Why: Protects the database from malicious or erroneous access, even if someone bypasses the application layer. Rule changes here are expensive (require migrations).
4.2 Hasura & GraphQL (Network Boundary) โ
Hasura exposes the Postgres database as a GraphQL API, providing basic structural type enforcement.
- Responsibility: Network boundary structural typing and row-level authorization.
- Workflow:
graphql-codegengenerates strict TypeScript types (e.g., mutation inputs) derived 1:1 from the database structure. - Limitations:
graphql-codegentypes are compile-time only. They disappear at runtime and cannot validate dynamic payloads.
4.3 Valibot (Domain & Runtime Boundary) โ
Valibot acts as the Single Source of Truth for Domain Types within the isomorphic packages/types workspace.
- Responsibility: Runtime validation, conditional domain constraints, and mutable business logic.
- Rules Enforced: String formatting (emails, regex), conditional cross-field logic (e.g., maximum passengers based on vehicle type), and UI-specific limits.
- Why Valibot:
- AI/LLM Safety: Validates untyped, unpredictable JSON payloads from OpenAI/Nest.js workers before they reach the database.
- Isomorphic UI Validation: Powers real-time frontend form validation in Nuxt using the exact same schema the backend trusts.
- Bundle Size: Highly tree-shakable architecture optimizes performance for public-facing B2C apps.
- Domain Decoupling: Acts as an anti-corruption layer, allowing transformation of database shapes (e.g.,
user_id) to cleaner domain models (userId) at the edge.
4.4 Preventing Schema Drift โ
Because we write Valibot domain schemas manually alongside auto-generated GraphQL structural types, the system requires strict synchronization to prevent silent failures.
- Rule: A Valibot schema cannot exist in isolation. It must be explicitly bound to the database structure.
- Mechanism: We must use TypeScript utilities (such as the
satisfiesoperator or strict type inferences) to link the ValibotInferOutputto thegraphql-codegentypes. If someone adds or modifies a database column via Hasura, the Valibot schema must trigger a compile-time TypeScript error until they update it to match.
5. Data Mutation & CQRS Strategy โ
To balance the performance of GraphQL with the safety of strict business logic, we adopt a pragmatic Command Query Responsibility Segregation (CQRS) approach:
- Reads (Queries): The system executes these directly via Hasura GraphQL. Frontend applications consume data using Hasura's Role-Based Access Control (RBAC).
- Simple CRUD (Writes): Non-domain state changes (e.g., updating a phone number) bypass NestJS and hit Hasura mutations directly, governed by RBAC.
- Fundamental State Constraints: Absolute domain rules (e.g., seats cannot be negative) are PostgreSQL
CHECKconstraints. - Simple Domain Validations: Hasura Input Validations (Pre-insert Webhooks) handle gateway checks and format validations.
- Complex Domain Logic: Operations requiring calculations, data transformation, or cross-table orchestration MUST route through Hasura Actions to custom NestJS command handlers.
6. Cross-Context Event Communication โ
Contexts must communicate asynchronously or via strict event contracts to avoid tight coupling.
- Guaranteed / Database-Backed Events: Use Hasura Event Triggers for durable, retryable events that must occur after the system commits a transaction. We define triggers as TypeScript decorators in NestJS via
@golevelup/nestjs-hasura, making the application code the source of truth (see workflow-orchestration.md). Note: While acting as a fire-after-commit webhook, Hasura Event Triggers differ from a true Transactional Outbox pattern as they lack custom retry queues or strict ordered delivery control. - Internal / Synchronous Routing: Use NestJS
@nestjs/event-emitterfor lightweight, in-memory event routing triggered by webhook handlers.
7. Edge Operations & Asynchronous Processing โ
The system requires specific architectural patterns to support field clients, data streaming, and resource-heavy tasks:
- Offline-First & Eventual Consistency: Mobile/field clients operating in low-connectivity environments must employ local-first storage. State changes and operational logs sync to the backend when the system restores connectivity. (A future
[future]offline-sync-strategy.mdwill document a detailed technical specification for this sync protocol). - High-Frequency Ingestion: High-volume data streams (e.g., vehicle telemetry) must bypass standard NestJS CQRS routing where possible, utilizing direct Hasura mutations or a dedicated fast-ingest path to prevent database locking.
- Heavy Asynchronous Processing: Handle resource-intensive tasks (e.g., AI parsing, media processing) asynchronously. Initial requests create a
PENDINGrecord, while background NestJS workers process the payload and update the database state upon completion.
8. Internationalization & Business Rules โ
Do not hardcode regional compliance (e.g., taxes, driving regulations) into core entities or frontends.
- Market Context: "Market" or "Jurisdiction" is an explicit domain concept. Associate every relevant transaction or operation with a tenant/market.
- Strategy Pattern (Policies): We abstract calculation rules (e.g., tax logic) into Policies (
ITaxCalculationPolicy). The application dynamically injects the appropriate regional implementation. The generatedTaxRulevalue objects embedded within aCostingSheetrepresent the concrete executed outcome of this policy. - Specification Pattern: Standalone Specification objects encapsulate legal validations (e.g., driving hours compliance), evaluating payloads and returning pass/fail compliance states.
9. Cross-Context Interaction Patterns โ
When the same real-world concept spans multiple bounded contexts, three distinct patterns govern how contexts collaborate without violating their boundaries.
9.1 Context Mapping: One Concept, Separate Entities โ
A single real-world thing (e.g., a pickup location) must exist as a separate model in each context that uses it. We shape each model by that context's ubiquitous language.
CAUTION
Anti-Pattern โ Shared Entity: Adding fields from one context onto another context's entity to "avoid duplication" (e.g., adding is_bookable or display_name to an Operations entity so Commerce can use it). This creates conceptual coupling disguised as pragmatism.
- The authoritative context owns the entity and emits domain events when it changes.
- Consuming contexts maintain local projections (read models or value objects) synced via those events.
- Each projection carries only the fields that context needs โ nothing more.
TIP
A useful litmus test: if two contexts would extend the same entity in conflicting directions (Commerce wants a display_name, Operations wants gps_waypoints), they need separate models.
9.2 Cross-Context Reads: CQRS Read Models โ
When a UI needs to display data owned by multiple contexts (e.g., a dispatcher dashboard showing boarding points and passenger counts), use a dedicated read model outside both contexts.
- Mechanism: Read-only SQL views or Hasura-computed fields joining across schemas.
- Ownership: The read model belongs to the application/UI layer, not to any bounded context.
- Consistency: Acceptable to be eventually consistent for display; never used for write-side decisions.
IMPORTANT
Write-side coupling (context A mutates context B's data) is always prohibited. Read-side coupling (a view joins A and B for display) is explicitly allowed โ this is a fundamental asymmetry of CQRS.
Do not sync volatile, high-frequency data (e.g., booking counts) into the consuming context via events. This duplicates Commerce's state inside Backoffice for no domain reason. Reserve event-driven projections for stable master data needed at transaction time (e.g., syncing a product catalog into Commerce for checkout).
The dispatch board requires a compound availability check. It combines data from Backoffice and Operations. We model this as an application-layer SQL view joining across schemas.
backoffice.crew_members.status = 'ACTIVE'- No
APPROVEDentry inbackoffice.crew_absencesoverlapping the target date - All required entries in
backoffice.crew_qualificationsareVALID backoffice.vehicles.status = 'ACTIVE'- No
vehicle_inspectionswithblocks_dispatch = true - No conflicting
operations.leg_assignmentsfor the target time window - Sufficient rest time per
operations.crew_duty_logs(EU-561/2006 evaluation)
See dispatch-availability-engine.md for SQL view definitions, GraphQL contracts, conflict detection rules, and edge states.
9.3 Cross-Context Writes: Saga Coordination โ
When a write operation in one context has consequences in another (e.g., deleting a record that another context references), use a request โ assess โ confirm โ execute choreography:
- Initiating context marks the record as
PENDING_REMOVAL(or equivalent) and emits a request event. - Affected context assesses the impact against its own data and responds with approval or rejection (including actionable data like affected counts and suggested alternatives).
- Human confirmation if the operation is destructive โ the UI presents the impact and asks the dispatcher to decide.
- Affected context executes the migration (e.g., reassigning references) and confirms completion.
- Initiating context completes the operation (e.g., hard delete) and emits a final cleanup event.
NOTE
In our modular monolith, we can implement sagas as synchronous domain services in the application layer โ calling each module's public API in sequence โ instead of requiring async message queues. Both approaches are semantically equivalent; choose based on latency requirements.
Key rule: Each context only mutates its own data. The saga coordinator orchestrates the sequence but never reaches into a context's internals.
10. Context Map โ
The concrete context map โ upstream/downstream relationships, relationship types, sync mechanisms, and the cross-boundary soft FK reference map โ lives in the domain model hub:
โ domain-model.md โ bounded context map, integration surface, and spoke index.
See adr-001-boarding-point-strategy.md for a concrete application of all three patterns.