Busflow Docs

Architecture Guidelines: Modular Monolith & Domain-Driven Design

1. Domain & Bounded Contexts

Our system operates within the Bus Tourism domain (with future expansion planned for broader Mobility Services). The architecture is a Modular Monolith utilizing NestJS, Hasura (GraphQL), and PostgreSQL.

We define four primary Bounded Contexts (Pillars), strictly segregated to prevent tightly coupled code:

Backoffice Context: Acts as the master record for business configuration, operational staff, abstract product definitions, and financial planning.
Commerce Context: Handles ticketing, sales, B2C/B2B conversions, capacity holds, and accounting/tax actuals.
Operations Context: Manages real-world execution, dispatching, fleet telemetry, incident reporting, and driver logistics (formerly "Driver Context").
Communications Context: A shared core domain providing omnichannel inbox capabilities and automated messaging to all other contexts.
Customer Intelligence Context: [future — Phase 3] An event-sourced analytics domain that consumes activity signals from all four operational contexts and produces behavioral aggregates, customer segmentation, and personalized recommendations. Enables the 360° Customer Profile vision. See ADR-021.

2. Database Boundary Enforcement

To maintain strict context boundaries within a shared PostgreSQL database exposed via Hasura:

PostgreSQL Schemas: Isolate tables into context-specific schemas (e.g., commerce.tour_offerings, operations.service_legs, backoffice.operators). Do not use the public schema for domain entities.
No Cross-Schema Foreign Keys: Bounded contexts must not enforce foreign keys against each other.
Reference by ID: Aggregate roots in one context must only store the ID (UUID/String) of entities in another context (e.g., commerce.tour_offerings stores tour_departure_id as a plain UUID column — no FK constraint to backoffice.tour_departures).
Read-Only SQL Views: When a context requires data owned by another context, utilize schema-bound SQL Views (e.g., the dispatch availability view joining backoffice.crew_members with operations.leg_assignments — see §9.2).

2.1 Multi-Tenant Data Isolation

Every domain table carries tenant_id UUID NOT NULL referencing backoffice.operators. Two layers enforce isolation. See tenant-isolation-strategy ADR.

Primary: Hasura Permission Rules. Every table's select, insert, update, delete permissions include a filter matching the JWT claim:

yaml

# Example: backoffice.tour_templates — role: dispatcher
select_permissions:
  filter:
    tenant_id: { _eq: "x-hasura-tenant-id" }
  columns: [id, tenant_id, title, status, ...]

Secondary: Postgres RLS (defense-in-depth). Each tenant-scoped table has a Row Level Security policy as a safeguard against Hasura Action bypasses or direct SQL access:

sql

ALTER TABLE backoffice.tour_templates ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON backoffice.tour_templates
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

NestJS sets app.current_tenant_id via SET LOCAL on each database connection.

Global reference tables (e.g., countries, currencies, vehicle_types) are exempt — they carry no tenant_id and have no RLS policy.

Busflow Staff use the elevated Hasura role busflow_staff with unrestricted select permissions (no tenant_id filter). The system uses this role exclusively for cross-tenant analytics, support, and tenant provisioning.

2.2 Physical Schema Index

Detailed physical schemas, Entity-Relationship Diagrams (ERDs), and table definitions reside individually per pillar:

Backoffice Schema (schema-backoffice.md):
- Scope: Configuration, operational staff, abstract product definitions (TourTemplate), concrete scheduled departures (TourDeparture), third-party inventory (Allotment), CRM, and financial planning (CostingSheet).
Commerce & Finance Schema (schema-commerce.md):
- Scope: Conversion and accounting engine, handling TourOfferings, B2C/B2B Bookings, Payments, ticketing, and actual margin taxation (FinancialLedger).
Operations Schema (schema-operations.md):
- Scope: Execution layer managing ServiceLegs, dispatching (LegAssignment), IoT fleet telemetry, OCR expense scanning, and offline app sync.
Communications Schema (schema-communications.md):
- Scope: Shared Core Domain providing omnichannel inbox capabilities (Conversations, Messages) and trigger-based automated messaging.

2.3 Aggregate-Level FK Policy

Hard foreign key constraints follow aggregate boundaries. See ADR-036 for the full decision rationale.

Relationship	FK Type	Rationale
Intra-aggregate (root ↔ child)	Hard FK	Shared transactional boundary. DB-level integrity enforces containment.
Cross-aggregate, lifecycle conflict	Soft ID reference	Independent status machines, unbounded collections, or snapshot-based independence.
Cross-aggregate, stable context reference	Hard FK	Target entity never deleted (status lifecycle only). No cascade risk.
Cross-schema	Soft ID reference	Existing §2 rule.
Self-referential version chain	Hard FK	Intra-aggregate version linking (e.g., `price_matrices.superseded_by`).

A cross-aggregate reference has a lifecycle conflict when:

The aggregates transition through independent status machines
The child collection is unbounded
An ADR explicitly mandates independent lifecycles
The referencing entity snapshots data at creation (no live reference needed)
Archival of the referenced entity should not cascade

For aggregate definitions per context, see the spoke files in domain-model.md.

3. Primary Database System

Intent & Business Context

The primary database must deliver enterprise-grade stability for transactions, high schema flexibility for variable travel itineraries, and a native foundation for AI capabilities (Magic Upload, Copilot).

Data Taxonomy:
- Transactional: Bookings, payments, passenger lists, fleet inventory (Requires strict ACID compliance).
- Variable: Itineraries, AI parser outputs (Requires semi-structured schema support).
- Network: Geographical routes, stop topologies.
- Semantic: Vector arrays for AI similarity mapping.

Decision: PostgreSQL

We selected PostgreSQL as the unified primary data store. This choice consolidates relational, document, and vector databases into a single sovereign system, aligning with the "Do More With Less" operational pillar.

High-Level Storage Strategy

Core Relational Storage:
- Intent: Enforce absolute data integrity for high-stakes financial and operational truth.
- Target Entities: Users, Operators, Bookings, Payments, Vehicles.
Variable Document Storage (JSONB):
- Intent: Prevent schema bloat (empty columns/complex joins) while accommodating inherently unpredictable trip structures and dynamic multi-tenant configurations.
- Target Entities: CostingSheet (Price Matrices, Cost Components), Vehicle (Seat Map Layouts), Itineraries.
AI & Semantic Storage (pgvector):
- Intent: Keep AI context immediately adjacent to operational data for zero-latency retrieval.
- Target Entities: Text embeddings generated from AI-assisted PDF pipelines (TourTemplate); Copilot conversational histories.

Rejected Alternatives

MongoDB:
- Reasoning: Discarded due to insufficient strict relational constraints. Unsuitable as a primary source of truth for seat maps, digital tickets, and financial ledgers.
Neo4j:
- Reasoning: Discarded as overly complex for primary CRUD operations (Apple Pay processing, user profiles).
- Future Consideration: We may adopt it later as an isolated microservice specifically handling complex geographical routing and fleet optimization.

4. Validation & Type Safety

High-Level Strategy: Defense in Depth

The Busflow monorepo divides validation and type-safety responsibilities across three distinct layers. This approach prevents duplication of effort while ensuring robust runtime and compile-time safety across frontend, AI workers, and the database.

4.1 PostgreSQL (Data Integrity)

PostgreSQL is the absolute source of truth for structural data integrity and persistence.

Responsibility: Facts that never change and safeguards for relational integrity.
Rules Enforced: Data types (INTEGER, TEXT), nullability (NOT NULL), relationships (Foreign Keys), uniqueness (UNIQUE), and absolute baseline constraints (e.g., price >= 0). See also §5 for the CQRS decision framework on when to place constraints here vs. Hasura vs. NestJS.
Why: Protects the database from malicious or erroneous access, even if someone bypasses the application layer. Rule changes here are expensive (require migrations).

4.2 Hasura & GraphQL (Network Boundary)

Hasura exposes the Postgres database as a GraphQL API, providing basic structural type enforcement.

Responsibility: Network boundary structural typing and row-level authorization.
Workflow: graphql-codegen generates strict TypeScript types (e.g., mutation inputs) derived 1:1 from the database structure.
Limitations: graphql-codegen types are compile-time only. They disappear at runtime and cannot validate dynamic payloads.

4.3 Valibot (Domain & Runtime Boundary)

Valibot acts as the Single Source of Truth for Domain Types within the isomorphic packages/types workspace.

Responsibility: Runtime validation, conditional domain constraints, and mutable business logic.
Rules Enforced: String formatting (emails, regex), conditional cross-field logic (e.g., maximum passengers based on vehicle type), and UI-specific limits.
Why Valibot:
- AI/LLM Safety: Validates untyped, unpredictable JSON payloads from OpenAI/Nest.js workers before they reach the database.
- Isomorphic UI Validation: Powers real-time frontend form validation in Nuxt using the exact same schema the backend trusts.
- Bundle Size: Highly tree-shakable architecture optimizes performance for public-facing B2C apps.
- Domain Decoupling: Acts as an anti-corruption layer, allowing transformation of database shapes (e.g., user_id) to cleaner domain models (userId) at the edge.

4.4 Preventing Schema Drift

Because we write Valibot domain schemas manually alongside auto-generated GraphQL structural types, the system requires strict synchronization to prevent silent failures.

Rule: A Valibot schema cannot exist in isolation. It must be explicitly bound to the database structure.
Mechanism: We must use TypeScript utilities (such as the satisfies operator or strict type inferences) to link the Valibot InferOutput to the graphql-codegen types. If someone adds or modifies a database column via Hasura, the Valibot schema must trigger a compile-time TypeScript error until they update it to match.

5. Data Mutation & CQRS Strategy

To balance the performance of GraphQL with the safety of strict business logic, we adopt a pragmatic Command Query Responsibility Segregation (CQRS) approach:

Reads (Queries): The system executes these directly via Hasura GraphQL. Frontend applications consume data using Hasura's Role-Based Access Control (RBAC).
Simple CRUD (Writes): Non-domain state changes (e.g., updating a phone number) bypass NestJS and hit Hasura mutations directly, governed by RBAC.
Fundamental State Constraints: Absolute domain rules (e.g., seats cannot be negative) are PostgreSQL CHECK constraints.
Simple Domain Validations: Hasura Input Validations (Pre-insert Webhooks) handle gateway checks and format validations.
Complex Domain Logic: Operations requiring calculations, data transformation, or cross-table orchestration MUST route through Hasura Actions to custom NestJS command handlers.

6. Cross-Context Event Communication

Contexts must communicate asynchronously or via strict event contracts to avoid tight coupling.

Guaranteed / Database-Backed Events: Use Hasura Event Triggers for durable, retryable events that must occur after the system commits a transaction. We define triggers as TypeScript decorators in NestJS via @golevelup/nestjs-hasura, making the application code the source of truth (see workflow-orchestration.md). Note: While acting as a fire-after-commit webhook, Hasura Event Triggers differ from a true Transactional Outbox pattern as they lack custom retry queues or strict ordered delivery control.
Internal / Synchronous Routing: Use NestJS @nestjs/event-emitter for lightweight, in-memory event routing triggered by webhook handlers.

7. Edge Operations & Asynchronous Processing

The system requires specific architectural patterns to support field clients, data streaming, and resource-heavy tasks:

Offline-First & Eventual Consistency: Mobile/field clients operating in low-connectivity environments must employ local-first storage. State changes and operational logs sync to the backend when the system restores connectivity. (A future [future] offline-sync-strategy.md will document a detailed technical specification for this sync protocol).
High-Frequency Ingestion: High-volume data streams (e.g., vehicle telemetry) must bypass standard NestJS CQRS routing where possible, utilizing direct Hasura mutations or a dedicated fast-ingest path to prevent database locking.
Heavy Asynchronous Processing: Handle resource-intensive tasks (e.g., AI parsing, media processing) asynchronously. Initial requests create a PENDING record, while background NestJS workers process the payload and update the database state upon completion.

8. Internationalization & Business Rules

Do not hardcode regional compliance (e.g., taxes, driving regulations) into core entities or frontends.

Market Context: "Market" or "Jurisdiction" is an explicit domain concept. Associate every relevant transaction or operation with a tenant/market.
Strategy Pattern (Policies): We abstract calculation rules (e.g., tax logic) into Policies (ITaxCalculationPolicy). The application dynamically injects the appropriate regional implementation. The generated TaxRule value objects embedded within a CostingSheet represent the concrete executed outcome of this policy.
Specification Pattern: Standalone Specification objects encapsulate legal validations (e.g., driving hours compliance), evaluating payloads and returning pass/fail compliance states.

9. Cross-Context Interaction Patterns

When the same real-world concept spans multiple bounded contexts, three distinct patterns govern how contexts collaborate without violating their boundaries.

9.1 Context Mapping: One Concept, Separate Entities

A single real-world thing (e.g., a pickup location) must exist as a separate model in each context that uses it. We shape each model by that context's ubiquitous language.

CAUTION

Anti-Pattern — Shared Entity: Adding fields from one context onto another context's entity to "avoid duplication" (e.g., adding is_bookable or display_name to an Operations entity so Commerce can use it). This creates conceptual coupling disguised as pragmatism.

The authoritative context owns the entity and emits domain events when it changes.
Consuming contexts maintain local projections (read models or value objects) synced via those events.
Each projection carries only the fields that context needs — nothing more.

TIP

A useful litmus test: if two contexts would extend the same entity in conflicting directions (Commerce wants a display_name, Operations wants gps_waypoints), they need separate models.

9.2 Cross-Context Reads: CQRS Read Models

When a UI needs to display data owned by multiple contexts (e.g., a dispatcher dashboard showing boarding points and passenger counts), use a dedicated read model outside both contexts.

Mechanism: Read-only SQL views or Hasura-computed fields joining across schemas.
Ownership: The read model belongs to the application/UI layer, not to any bounded context.
Consistency: Acceptable to be eventually consistent for display; never used for write-side decisions.

IMPORTANT

Write-side coupling (context A mutates context B's data) is always prohibited. Read-side coupling (a view joins A and B for display) is explicitly allowed — this is a fundamental asymmetry of CQRS.

Do not sync volatile, high-frequency data (e.g., booking counts) into the consuming context via events. This duplicates Commerce's state inside Backoffice for no domain reason. Reserve event-driven projections for stable master data needed at transaction time (e.g., syncing a product catalog into Commerce for checkout).

The dispatch board requires a compound availability check. It combines data from Backoffice and Operations. We model this as an application-layer SQL view joining across schemas.

backoffice.crew_members.status = 'ACTIVE'
No APPROVED entry in backoffice.crew_absences overlapping the target date
All required entries in backoffice.crew_qualifications are VALID
backoffice.vehicles.status = 'ACTIVE'
No vehicle_inspections with blocks_dispatch = true
No conflicting operations.leg_assignments for the target time window
Sufficient rest time per operations.crew_duty_logs (EU-561/2006 evaluation)

See dispatch-availability-engine.md for SQL view definitions, GraphQL contracts, conflict detection rules, and edge states.

9.3 Cross-Context Writes: Saga Coordination

When a write operation in one context has consequences in another (e.g., deleting a record that another context references), use a request → assess → confirm → execute choreography:

Initiating context marks the record as PENDING_REMOVAL (or equivalent) and emits a request event.
Affected context assesses the impact against its own data and responds with approval or rejection (including actionable data like affected counts and suggested alternatives).
Human confirmation if the operation is destructive — the UI presents the impact and asks the dispatcher to decide.
Affected context executes the migration (e.g., reassigning references) and confirms completion.
Initiating context completes the operation (e.g., hard delete) and emits a final cleanup event.

NOTE

In our modular monolith, we can implement sagas as synchronous domain services in the application layer — calling each module's public API in sequence — instead of requiring async message queues. Both approaches are semantically equivalent; choose based on latency requirements.

Key rule: Each context only mutates its own data. The saga coordinator orchestrates the sequence but never reaches into a context's internals.

10. Context Map

The concrete context map — upstream/downstream relationships, relationship types, sync mechanisms, and the cross-boundary soft FK reference map — lives in the domain model hub:

→ domain-model.md — bounded context map, integration surface, and spoke index.

See adr-001-boarding-point-strategy.md for a concrete application of all three patterns.

Busflow Docs

Architecture Guidelines: Modular Monolith & Domain-Driven Design ​

1. Domain & Bounded Contexts ​

2. Database Boundary Enforcement ​

2.1 Multi-Tenant Data Isolation ​

2.2 Physical Schema Index ​

2.3 Aggregate-Level FK Policy ​

3. Primary Database System ​

Intent & Business Context ​

Decision: PostgreSQL ​

High-Level Storage Strategy ​

Rejected Alternatives ​

4. Validation & Type Safety ​

4.1 PostgreSQL (Data Integrity) ​

4.2 Hasura & GraphQL (Network Boundary) ​

4.3 Valibot (Domain & Runtime Boundary) ​

4.4 Preventing Schema Drift ​

5. Data Mutation & CQRS Strategy ​

6. Cross-Context Event Communication ​

7. Edge Operations & Asynchronous Processing ​

8. Internationalization & Business Rules ​

9. Cross-Context Interaction Patterns ​

9.1 Context Mapping: One Concept, Separate Entities ​

9.2 Cross-Context Reads: CQRS Read Models ​

9.3 Cross-Context Writes: Saga Coordination ​

10. Context Map ​

Related Documents