Documentation TODOs
This document tracks identified gaps in the project documentation. Completed items have been moved to completed-todos.md.
1. Product & Domain Specifications
[High Priority] Pricing Engine & Margin Rules:
- Context: An important gap requiring further research and customer interviews.
- Action: Document the exact mathematical rules for the "Dynamic Margin Calculator", discount priorities (e.g., early-bird vs. group), and DACH-specific tax handling (e.g., VAT rules for international travel).
[High Priority] Copilot & AI Assistant Specifications:
- Context:
docs/architecture/ai.mddetails the infrastructure, but there are no concrete product specifications or user flows for how the user actually interacts with these AI features in the UI. - Action: Define UX/UI flows, trigger points, and failure states for specific AI features (e.g., Customer Support Copilot, Dispatch Assistant).
- Context:
[High Priority] Permissions and Roles Matrix (RBAC):
- Context:
docs/architecture/roles.mdexists, but there is no detailed product specification outlining exact permission tiers. - Action: Document a detailed RBAC matrix specifying read/write access for different roles (e.g., Dispatcher, Manager, Accounting) within the
workspaceapp.
- Context:
[Medium Priority] Post-Booking Passenger Modifications:
- Context: The B2C passenger journey covers the initial booking, but the logic for modifying an existing booking is not documented.
- Action: Document the user flows and business rules for post-booking modifications (e.g., adding luggage, changing a boarding point, cancellations) via the passenger portal.
[Medium Priority] SaaS Subscription & Pricing Tiers Logic:
- Context: The Lago two-layer billing architecture and metered dimensions are specified in
PRODUCT_payments-and-billing.md§3. However, the exact tier names, features restricted per tier, and € pricing remain undefined. - Action: Create
PRODUCT_pricing-tiers.mdto document the specific subscription tiers (e.g., Free, Pro, Enterprise), feature gating mappings, and Free Trial logic.
- Context: The Lago two-layer billing architecture and metered dimensions are specified in
2. Technical & Architecture Details
[High Priority] Authentication & "Magic Link" Flows:
- Context: Specific auth flows for passengers (Magic Link), B2B agents, and internal Dispatchers need definition.
- Action: Create a centralized authentication concept document. This document should remain general, linking out to specific implementation areas and external docs (like Nhost Auth), rather than repeating low-level details.
[High Priority] Domain 5 / L3 Infra Wiring Follow-ups (before first production deploy):
Items below require secrets or GitHub-side configuration that cannot be checked into the repo. They must land before the two new workflows (
.github/workflows/backup-verify.yml,.github/workflows/secrets-sync.yml) fire against production.A.
backup-verify.ymlopen items (triage reference:backup-verify-runbook.md)- GitHub Secrets required:
RCLONE_CONFIG— full rclone.confbody with both remotes:hetzner-object-storage:(primary — pg_dump producer target)minio-mirror:(fallback — used whenworkflow_dispatchinputbackup_source=minio-fallbackis selected)
SLACK_OPS_WEBHOOK— incoming-webhook URL for#ops. The workflow auto-detects 4xx on the webhook and warns the operator to rotate.
- Stub implementation to replace:
scripts/verify_soft_fks.pycurrently returns 0 unconditionally. Needs a real implementation that readsconfig/backup-verify/soft-fk-allowlist.yaml, sweeps the restored database for orphaned rows in declared soft-FK pairs (messages.booking_id,payments.refund_passenger_id,service_leg_overrides.leg_id), and fails (non-zero exit) when any pair exceeds itsmax_orphanstolerance. - Cron-until-secrets decision: the workflow is on
0 3 * * *. Options before the secrets land:- (a) comment out the
schedule:block and rely onworkflow_dispatchonly until the first dump exists, or - (b) leave the cron enabled so the failure surfaces loudly as a Slack alert on day 1 and forces the operator to configure the secrets. Pick explicitly; do not let it drift.
- (a) comment out the
- Producer contract (separate repo wiring):
docker/backup/producer.envdeclares the assumedpg_dumpflags (--format=custom,--jobs=4, extensionspgcrypto,pg_cron,vector,pgsodium,SIZE_MIN_GB=10,SIZE_MAX_FACTOR=2.0,PGSSLMODE=verify-full). The producer is not yet deployed — when it lands (on the Swarm data-tier node runningpg_dumpagainst Ubicloud), verify the flags match or bump the workflow'sSIZE_MIN_GB/ size guard to reality. - Trailing-median size guard: currently advisory (see workflow step
Size guard). Hydrate the upper bound from Mimir once the backup-producer job emitsbackup_bytes_total{job="pg_dump_producer"}— runbook F1 has the metrics query.
B.
secrets-sync.ymlopen items (ADR reference: ADR-029; triage reference:secrets-rotation-runbook.md)GitHub Environment required: create an Environment called
productionwith a required-reviewer protection rule (Julian + one other founder). Thejobs.sync.environment.nameexpression resolves toproductionwhen the dispatch input selects it; without the Environment protection rule, anyone with repo-write can push secrets into the live Swarm.GitHub Secrets required (both Environments):
PRODUCTION_SWARM_MANAGER_HOSTS— whitespace-separated list of manager public IPs (or DNS names). Populated from the Terraform output of the swarm module. A plain list; one per line is fine.PRODUCTION_MANAGER_HOST_KEYS— concatenated~/.ssh/known_hostscontent covering every manager node, built from the ADR-025 per-manager SSH host keys Terraform output (terraform output -raw manager_host_keys). Never replace this with aStrictHostKeyChecking=noworkaround — the CI lint in the repo enforces ADR-025.DEPLOY_SSH_KEY— ED25519 private key authorised inhcloud_ssh_key.nodesforroot@SSH onto the managers.SLACK_OPS_WEBHOOK— same as above (workflow posts the new_v<n+1>logical name only — never the value).
First-use / bootstrap sequence: the first real run should dispatch with
secret_name = pgexporter_dsnandsource_github_secret = PGEXPORTER_DSN_V1to createpgexporter_dsn_v1on the managers. Once that exists, lift thereplicas: 0gate onpostgres-exporterindocker/docker-compose.observability.ymlper alert A8 — the placeholder gate exists specifically becausepgexporter_dsnis not populated yet.Cross-repo secrets to sync next:
busflow_db_writer,mollie_api_key,whatsapp_360dialog_token. Each dispatch creates a new versioned_v<n+1>; the operator flips services one at a time per the rotation runbook.Rotation cadence: ADR-029 commits to quarterly rotation for provider keys. Schedule the calendar reminders once the first rotation completes, so the cadence starts from a known-good baseline rather than from "workflow exists."
Context: Tools currently listed (Stripe, Klarna, HanseMerkur, 360dialog/Twilio, DATEV) were brainstormed and may change (e.g., likely dropping Stripe and Twilio).
Action: Review and update the documentation to reflect the finalized stack of third-party integrations, and define the webhooks and API contracts for the ones that remain.
- GitHub Secrets required:
[Medium Priority] PricingRule & CapacityRule Value Object Specification:
- Context: The
domain-model.mdmentionsPricingRuleandCapacityRuleas "Value Objects" but they are not shown in the ER diagram, nor is their storage location specified (JSONB onTourOffering? OnCostingSheet? Standalone?). - Action: Define the exact structure, storage location, and lifecycle of these value objects. Clarify how they interact with the
CostingSheetprice matrix generation and whether they are inherited fromTourTemplate→TourOffering.
- Context: The
[Medium Priority] Dev-Process AI Agent Documentation:
- Context: The "Agentic Development Process" section in
ai.mdcovers SDLC integration at a high level (automated PR reviews, autonomous refactoring, shift-left security), but lacks operational detail. In contrast, product-facing AI (Copilot, Magic Upload, agent pipelines) is thoroughly documented across multiple files. - Action: Expand documentation to cover: CI/CD integration of AI code reviews (which tools, how they're triggered), security audit workflow specifics (scan triggers, alert routing), tooling configuration (MCP setup, AgentShield policies), and concrete examples of the dev-agent feedback loop.
- Context: The "Agentic Development Process" section in
[High Priority] Testing Strategy Specifics (E2E & Offline):
- Context:
docs/architecture/tests.mdexists, but lacks details on how E2E testing (Playwright) will handle the offline-first capabilities of the PWA. - Action: Document the approach for simulating offline mode and verifying sync mechanisms during E2E tests for the driver app.
- Context:
[High Priority] Migration Playbook:
- Context: The project scope mentions "Zero-Downtime Migration" as a core value pillar, but technical documentation is missing.
- Action: Detail the data migration pipeline, including extraction strategies from legacy systems (Kuschick/Turista) and transformation mapping to the Busflow PostgreSQL schema.
[High Priority] Production Container Resource Limits:
- Context:
docker-compose.production.ymldeclares zerodeploy.resources.limitson any service. A single runaway container triggers the kernel OOM-killer, which can take down Postgres. - Action: Profile services under load, then add
deploy.resources.limits.memoryanddeploy.resources.reservations.cpusto every service. Seeinfrastructure.md§5.
- Context:
[Medium Priority] Terraform Drift Detection Workflow:
- Context:
terraform.ymlonly runsterraform planon PRs touchingterraform/**. Manual SSH changes to servers drift silently. - Action: Implement
.github/workflows/terraform-drift.yml— weekly cron runningterraform plan -detailed-exitcode, alerting on exit code 2. Seeinfrastructure.md§5.
- Context:
[Medium Priority] Hasura Metadata Drift Check:
- Context:
deploy.ymlapplies metadata forward-only viahasura metadata apply, silently overwriting console changes. - Action: Add post-deploy
hasura metadata diffstep todeploy.yml+ weekly cron variant. Seeci-cd.md§2.
- Context:
[Low Priority] Disaster Recovery Drill:
- Context: Backup verification tests restorability but not the full system rebuild path (env vars, Docker registry auth, DNS, Terraform state).
- Action: First drill after Ubicloud cutover; cadence escalates with SLA tier (annual → quarterly). See
infrastructure.md§6.
[Medium Priority] API Rate Limiting & Security Policies:
- Context: Infrastructure docs list Traefik and Hasura, but specific security configurations are missing.
- Action: Specify rate limiting strategies, Web Application Firewall (WAF) rules, and exact CORS policies to protect public-facing endpoints.
[Low Priority] Configure
getbusflow.comTraefik Routing:- Context: The Production Service Map in
ci-cd.mdlistsgetbusflow.comas a Landing endpoint alongsidebusflow.de, but no Traefik router rule exists for the domain indocker-compose.studio.yml. Onlybusflow.deis currently routed. - Action: Add a Traefik router for
getbusflow.com(and optionallywww.getbusflow.com) pointing to the landing service, or redirect tobusflow.de. Ensure DNS records exist. Updateci-cd.mdonce wired.
- Context: The Production Service Map in
[Low Priority] Remove
--passWithNoTestsfrom@busflow/apiTest Script:- Context: The
apps/api/package.jsontest script usesjest --passWithNoTestsas a temporary workaround to prevent CI failures while no.spec.tsfiles exist. - Action: Once the first test files land in
apps/api/src/, revert the script back to"test": "jest"so that accidental test-file deletions surface as CI failures.
- Context: The
3. Documentation Housekeeping
[Medium Priority] Investigate Documentation Reorganization Potential:
- Context: As the documentation has grown organically, there may be overlapping content, unclear categorization, or opportunities to consolidate related topics.
- Action: Audit the current
docs/structure for duplicated content, inconsistent categorization, and navigation pain points. Propose a revised information architecture if warranted.
[High Priority] Reduce Core Docs Folder Size:
- Context: The core
docs/folder may contain content that could be moved closer to the code it describes (e.g., intoapps/*/docs/orpackages/*/docs/), archived, or consolidated. - Action: Inventory all files in the root
docs/directory. Identify candidates for: (1) co-location with their respective app/package, (2) merging into other documents, (3) archiving outdated content. Goal is a drastically leaner core docs folder.
- Context: The core
[Medium Priority] Placeholder User Journey Narratives:
- Context: 6 user journey spokes in
3-resources/user-journeys/contain only scenario stubs and need full multi-step narratives: Concierge Onboarding, Email-as-API Ingestion, Omnichannel Inbox Triage, Collaborative Trip Planning, Zero-Downtime Legacy Migration, Trigger-Based Lifecycle Messaging. Each placeholder links to its source capability definition. - Action: Flesh out as the corresponding features reach design maturity. See PRODUCT_user-journeys.md registry (Journeys 9–14). Origin: formerly SB-21, relocated from STRATEGIC_BLIND_SPOTS.md (2026-05-11).
- Context: 6 user journey spokes in
4. AI Agent & Tooling Setup
[Medium Priority] Autonomous Documentation Issue Agent:
- Context: The project documentation requires continuous maintenance and review to prevent conflicting information and outdated content.
- Action: Add an autonomous AI agent configured to run daily to scan documentation for inconsistencies (e.g., conflicting information) and automatically create PRs to address identified issues.
[Medium Priority] Investigate & Decide on Story / Task Management Tool:
- Context: The monorepo currently lacks a formal in-repo mechanism for tracking stories, tasks, and sprint-level work items beyond this TODO file.
- Action: Research options for lightweight, repo-native task management (e.g., GitHub Projects, Linear integration, markdown-based tools like
todo.mdconventions, or issue-linked task files). Evaluate against criteria: developer friction, visibility, integration with CI/PR workflow, and suitability for a small founding team. Recommend and document the chosen approach.
[High Priority] Docs-Hub — Deploy Documentation Portal:
- Context: The
docs-hub(studio/docs-hub) VitePress portal andcontext-engine(packages/context-engine) RAG server are implemented. The predecessordocs-assistanthas been removed. - Action: Deploy docs-hub (static VitePress) and context-engine to Hetzner Docker Swarm. Verify end-to-end AI chat, search, and role-based content visibility. → See docs-hub.md for the full specification.
- Context: The
[Medium Priority] Add GitHub CODEOWNERS File:
- Context: The monorepo contains sensitive infrastructure (
terraform/,docker/,.github/workflows/,scripts/) alongside product code. Access control and review requirements differ between these areas. - Action: Create a
.github/CODEOWNERSfile restricting approval of infrastructure and CI/CD changes to founders/ops. This provides the security boundary benefit of repo extraction without the multi-repo overhead.
- Context: The monorepo contains sensitive infrastructure (
5. Deferred Design Decisions (TourDeparture Refactoring)
[Medium Priority] Vehicle & Crew Assignment to TourDeparture:
- Context: The
TourDepartureentity is the natural place to assign vehicles and crew to a scheduled departure. Currently no linking tables exist. Vehicle/crew assignment may need to be per-leg rather than per-departure (e.g., different drivers for differentServiceLegs). - Action: Design the assignment model. Key questions: (1) Is assignment per-departure (
tour_vehicle_assignments,tour_crew_assignments) or per-ServiceLeg(viaLegAssignmentin Operations)? (2) If both exist, which is "planned" vs. "actual"? (3) Does this overlap with Operations' existingLegAssignmententity? Coordinate with the Operations schema design.
- Context: The
[Low Priority] Allotment Linking to TourDeparture:
- Context:
Allotment(reserved hotel rooms, ferry slots) currently links only toSupplier. TheCostingSheetimplicitly factors allotments viaprocurement_items[], but there is no explicitTourDeparture→Allotmentlink showing which inventory blocks are consumed by which departure. - Action: Decide whether explicit linking is needed. Key questions: (1) Is the implicit link through
CostingSheet.procurement_items[].allotment_idsufficient? (2) Does an explicit junction table (tour_departure_allotments) add value for capacity tracking (Backoffice Review, Strategic Blind Spot #4: "allotment consumption tracking")? (3) Could this be handled as a computed view instead?
- Context:
[Low Priority] Departure-Specific Pickup Times + Boarding Order:
- Context: Both pickup times and boarding order are operational concerns that depend on the finalized passenger list and actual door pickup bookings — not product configuration. Defining a static boarding order at the library level implies a fixed route that ignores real-world booking patterns. The current data model has no place for either.
- Action: Design the pickup time + boarding order assignment flow as a
[v0.2]drilldown. Likely requires a dispatch-side table or computation. Separate from the boarding point library design.
6. Industry Best Practices & Future Gaps (Phase 2+)
These items were identified during the Level 1-3 audits as important industry-standard capabilities that are not yet part of the core "Ready to Code" implementation.
[Medium Priority] Bank Statement Import & Auto-Matching [Phase 2]:
- Context: No mechanism for importing MT940/CAMT files to match payments to bookings. Moved to Phase 2 per
PRODUCT_payments-and-billing.md. - Action: Architect
bank_importsandbank_transactionstables for automated bookkeeping when Phase 2 begins.
- Context: No mechanism for importing MT940/CAMT files to match payments to bookings. Moved to Phase 2 per
[High Priority] Partial Refund Flows (Mollie Sync):
- Context: Explicit mapping between partial booking cancellations and Mollie ledger refunds is missing.
[High Priority] Group Booking Modifications:
- Context: Securely tracking post-confirmation passenger additions and price adjustments.
[Medium Priority] Voucher & Gift Card Management:
- Context: Complex accounting for partial redemptions and refunds of value-based vouchers.
[Medium Priority] Booking Source Channel:
- Context: Dispatcher visibility into booking origin (Widget, API, Manual, Phone).
[Medium Priority] Automated Compliance Checks (VIES/Visa):
- Context: Real-time validation of EU VAT IDs and passenger visa requirements for manifests.
[Low Priority] Dynamic Booking Questions:
- Context: Configurable data collection (passport numbers, etc.) during the checkout flow.
[Medium Priority] Bus Hardware Integration & Telematics:
- Context: The current
TelemetryPointentity assumes GPS/speed/fuel data from hardware on the bus, but Phase 1 has no hardware integration. Telematics hardware (OBD-II dongles, GPS trackers, dashcams) could enable live ETA tracking ("Where is my Bus?"), fuel monitoring, digital tachograph data import (EU-561 compliance), and route deviation alerts. This is a significant differentiator for Pillar 3 but requires vendor evaluation, hardware provisioning logistics, and a dedicated real-time data ingestion pipeline (separate from the offline mutation sync protocol). - Action: Research: (1) available telematics hardware for DACH bus fleets (e.g., Geotab, Samsara, TomTom Telematics, Webfleet), (2) whether to use a third-party telematics platform or build a lightweight ingestion API, (3) cost model (per-vehicle hardware + SaaS fee vs. one-time OBD-II dongle), (4) fallback option using the driver's phone GPS via PWA Geolocation API for Phase 1 ETA tracking. Define scope boundaries: what's achievable with phone-only GPS vs. what requires dedicated hardware.
- Context: The current
7. Förderung — Forschungszulage (FZulG)
- [High Priority] AP10–AP19 — Codebasis-Validierung & Plan-Adjustment für FZulG-Tauglichkeit:
- Context: Der AP-Katalog (
funding-work-packages.md) listet AP10–AP19 als 🟡 Kandidaten — Domänenartefakt vorhanden, aber Codebasis-Status (vorhanden / Phase 2 / nur Konzept) und tatsächliche Implementierungsausprägung sind nicht verifiziert. Risikobewertung inbsfz-ablehnungsrisiken.mdhängt teils an Implementierungs-Details (insb. AP18 CRDT vs. Last-Write-Wins). - Action: Pro Kandidat-AP entscheiden: (a) AP entfällt mangels Codebasis-Substanz, (b) AP wird in einen anderen integriert (z.B. AP15 in AP3), (c) AP bleibt eigenständig. Bevorzugte Strategie: Wo der Implementierungs-Plan in den Docs (Domain Model, Protokolle, ADRs) noch flexibel ist, Plan an die FZulG-günstige Variante anpassen statt umgekehrt. Beispiel AP18: Wenn CRDT vs. Last-Write-Wins noch nicht entschieden, das Domain-Modell so schärfen, dass die CRDT-Variante als geplante Lösung dokumentiert wird (sofern technisch sinnvoll).
- Sonderfall AP18: Binäre Entscheidung CRDT (Yjs/Automerge) vs. Last-Write-Wins-via-Hasura. Bei letzterem: AP18 streichen — kein F&E-Anteil.
- Output: Status-Spalten in
funding-work-packages.mdaktualisieren (✅/🟡/⚪ → final), Risiko-Tabelle inbsfz-ablehnungsrisiken.mdsyncen, ggf. Vorhaben-Clustering inSTRATEGY_public-funding.mdanpassen. - Reihenfolge: Vor der eigentlichen BSFZ-Antragsverfassung (Phase 2 in
funding-application-plan.md, W. 5–14). Ohne Validierung droht Stundenüberschätzung im Antrag → Glaubwürdigkeitsrisiko beim Gutachter.
- Context: Der AP-Katalog (