Documentation TODOs

This document tracks identified gaps in the project documentation. Completed items have been moved to completed-todos.md.

1. Product & Domain Specifications

[High Priority] Pricing Engine & Margin Rules:
- Context: An important gap requiring further research and customer interviews.
- Action: Document the exact mathematical rules for the "Dynamic Margin Calculator", discount priorities (e.g., early-bird vs. group), and DACH-specific tax handling (e.g., VAT rules for international travel).
[High Priority] Copilot & AI Assistant Specifications:
- Context: docs/architecture/ai.md details the infrastructure, but there are no concrete product specifications or user flows for how the user actually interacts with these AI features in the UI.
- Action: Define UX/UI flows, trigger points, and failure states for specific AI features (e.g., Customer Support Copilot, Dispatch Assistant).
[High Priority] Permissions and Roles Matrix (RBAC):
- Context: docs/architecture/roles.md exists, but there is no detailed product specification outlining exact permission tiers.
- Action: Document a detailed RBAC matrix specifying read/write access for different roles (e.g., Dispatcher, Manager, Accounting) within the workspace app.
[Medium Priority] Post-Booking Passenger Modifications:
- Context: The B2C passenger journey covers the initial booking, but the logic for modifying an existing booking is not documented.
- Action: Document the user flows and business rules for post-booking modifications (e.g., adding luggage, changing a boarding point, cancellations) via the passenger portal.
[Medium Priority] SaaS Subscription & Pricing Tiers Logic:
- Context: The Lago two-layer billing architecture and metered dimensions are specified in PRODUCT_payments-and-billing.md §3. However, the exact tier names, features restricted per tier, and € pricing remain undefined.
- Action: Create PRODUCT_pricing-tiers.md to document the specific subscription tiers (e.g., Free, Pro, Enterprise), feature gating mappings, and Free Trial logic.

2. Technical & Architecture Details

[High Priority] Authentication & "Magic Link" Flows:
- Context: Specific auth flows for passengers (Magic Link), B2B agents, and internal Dispatchers need definition.
- Action: Create a centralized authentication concept document. This document should remain general, linking out to specific implementation areas and external docs (like Nhost Auth), rather than repeating low-level details.
[High Priority] Domain 5 / L3 Infra Wiring Follow-ups (before first production deploy):
Items below require secrets or GitHub-side configuration that cannot be checked into the repo. They must land before the two new workflows (.github/workflows/backup-verify.yml, .github/workflows/secrets-sync.yml) fire against production.
A. backup-verify.yml open items (triage reference: backup-verify-runbook.md)
- GitHub Secrets required:
  - RCLONE_CONFIG — full rclone .conf body with both remotes:
    - hetzner-object-storage: (primary — pg_dump producer target)
    - minio-mirror: (fallback — used when workflow_dispatch input backup_source=minio-fallback is selected)
  - SLACK_OPS_WEBHOOK — incoming-webhook URL for #ops. The workflow auto-detects 4xx on the webhook and warns the operator to rotate.
- Stub implementation to replace: scripts/verify_soft_fks.py currently returns 0 unconditionally. Needs a real implementation that reads config/backup-verify/soft-fk-allowlist.yaml, sweeps the restored database for orphaned rows in declared soft-FK pairs (messages.booking_id, payments.refund_passenger_id, service_leg_overrides.leg_id), and fails (non-zero exit) when any pair exceeds its max_orphans tolerance.
- Cron-until-secrets decision: the workflow is on 0 3 * * *. Options before the secrets land:
  - (a) comment out the schedule: block and rely on workflow_dispatch only until the first dump exists, or
  - (b) leave the cron enabled so the failure surfaces loudly as a Slack alert on day 1 and forces the operator to configure the secrets. Pick explicitly; do not let it drift.
- Producer contract (separate repo wiring): docker/backup/producer.env declares the assumed pg_dump flags (--format=custom, --jobs=4, extensions pgcrypto,pg_cron,vector,pgsodium, SIZE_MIN_GB=10, SIZE_MAX_FACTOR=2.0, PGSSLMODE=verify-full). The producer is not yet deployed — when it lands (on the Swarm data-tier node running pg_dump against Ubicloud), verify the flags match or bump the workflow's SIZE_MIN_GB / size guard to reality.
- Trailing-median size guard: currently advisory (see workflow step Size guard). Hydrate the upper bound from Mimir once the backup-producer job emits backup_bytes_total{job="pg_dump_producer"} — runbook F1 has the metrics query.
B. secrets-sync.yml open items (ADR reference: ADR-029; triage reference: secrets-rotation-runbook.md)
- GitHub Environment required: create an Environment called production with a required-reviewer protection rule (Julian + one other founder). The jobs.sync.environment.name expression resolves to production when the dispatch input selects it; without the Environment protection rule, anyone with repo-write can push secrets into the live Swarm.
- GitHub Secrets required (both Environments):
  - PRODUCTION_SWARM_MANAGER_HOSTS — whitespace-separated list of manager public IPs (or DNS names). Populated from the Terraform output of the swarm module. A plain list; one per line is fine.
  - PRODUCTION_MANAGER_HOST_KEYS — concatenated ~/.ssh/known_hosts content covering every manager node, built from the ADR-025 per-manager SSH host keys Terraform output (terraform output -raw manager_host_keys). Never replace this with a StrictHostKeyChecking=no workaround — the CI lint in the repo enforces ADR-025.
  - DEPLOY_SSH_KEY — ED25519 private key authorised in hcloud_ssh_key.nodes for root@ SSH onto the managers.
  - SLACK_OPS_WEBHOOK — same as above (workflow posts the new _v<n+1> logical name only — never the value).
- First-use / bootstrap sequence: the first real run should dispatch with secret_name = pgexporter_dsn and source_github_secret = PGEXPORTER_DSN_V1 to create pgexporter_dsn_v1 on the managers. Once that exists, lift the replicas: 0 gate on postgres-exporter in docker/docker-compose.observability.yml per alert A8 — the placeholder gate exists specifically because pgexporter_dsn is not populated yet.
- Cross-repo secrets to sync next: busflow_db_writer, mollie_api_key, whatsapp_360dialog_token. Each dispatch creates a new versioned _v<n+1>; the operator flips services one at a time per the rotation runbook.
- Rotation cadence: ADR-029 commits to quarterly rotation for provider keys. Schedule the calendar reminders once the first rotation completes, so the cadence starts from a known-good baseline rather than from "workflow exists."
- Context: Tools currently listed (Stripe, Klarna, HanseMerkur, 360dialog/Twilio, DATEV) were brainstormed and may change (e.g., likely dropping Stripe and Twilio).
- Action: Review and update the documentation to reflect the finalized stack of third-party integrations, and define the webhooks and API contracts for the ones that remain.
[Medium Priority] PricingRule & CapacityRule Value Object Specification:
- Context: The domain-model.md mentions PricingRule and CapacityRule as "Value Objects" but they are not shown in the ER diagram, nor is their storage location specified (JSONB on TourOffering? On CostingSheet? Standalone?).
- Action: Define the exact structure, storage location, and lifecycle of these value objects. Clarify how they interact with the CostingSheet price matrix generation and whether they are inherited from TourTemplate → TourOffering.
[Medium Priority] Dev-Process AI Agent Documentation:
- Context: The "Agentic Development Process" section in ai.md covers SDLC integration at a high level (automated PR reviews, autonomous refactoring, shift-left security), but lacks operational detail. In contrast, product-facing AI (Copilot, Magic Upload, agent pipelines) is thoroughly documented across multiple files.
- Action: Expand documentation to cover: CI/CD integration of AI code reviews (which tools, how they're triggered), security audit workflow specifics (scan triggers, alert routing), tooling configuration (MCP setup, AgentShield policies), and concrete examples of the dev-agent feedback loop.
[High Priority] Testing Strategy Specifics (E2E & Offline):
- Context: docs/architecture/tests.md exists, but lacks details on how E2E testing (Playwright) will handle the offline-first capabilities of the PWA.
- Action: Document the approach for simulating offline mode and verifying sync mechanisms during E2E tests for the driver app.
[High Priority] Migration Playbook:
- Context: The project scope mentions "Zero-Downtime Migration" as a core value pillar, but technical documentation is missing.
- Action: Detail the data migration pipeline, including extraction strategies from legacy systems (Kuschick/Turista) and transformation mapping to the Busflow PostgreSQL schema.
[High Priority] Production Container Resource Limits:
- Context: docker-compose.production.yml declares zero deploy.resources.limits on any service. A single runaway container triggers the kernel OOM-killer, which can take down Postgres.
- Action: Profile services under load, then add deploy.resources.limits.memory and deploy.resources.reservations.cpus to every service. See infrastructure.md §5.
[Medium Priority] Terraform Drift Detection Workflow:
- Context: terraform.yml only runs terraform plan on PRs touching terraform/**. Manual SSH changes to servers drift silently.
- Action: Implement .github/workflows/terraform-drift.yml — weekly cron running terraform plan -detailed-exitcode, alerting on exit code 2. See infrastructure.md §5.
[Medium Priority] Hasura Metadata Drift Check:
- Context: deploy.yml applies metadata forward-only via hasura metadata apply, silently overwriting console changes.
- Action: Add post-deploy hasura metadata diff step to deploy.yml + weekly cron variant. See ci-cd.md §2.
[Low Priority] Disaster Recovery Drill:
- Context: Backup verification tests restorability but not the full system rebuild path (env vars, Docker registry auth, DNS, Terraform state).
- Action: First drill after Ubicloud cutover; cadence escalates with SLA tier (annual → quarterly). See infrastructure.md §6.
[Medium Priority] API Rate Limiting & Security Policies:
- Context: Infrastructure docs list Traefik and Hasura, but specific security configurations are missing.
- Action: Specify rate limiting strategies, Web Application Firewall (WAF) rules, and exact CORS policies to protect public-facing endpoints.
[Low Priority] Configure getbusflow.com Traefik Routing:
- Context: The Production Service Map in ci-cd.md lists getbusflow.com as a Landing endpoint alongside busflow.de, but no Traefik router rule exists for the domain in docker-compose.studio.yml. Only busflow.de is currently routed.
- Action: Add a Traefik router for getbusflow.com (and optionally www.getbusflow.com) pointing to the landing service, or redirect to busflow.de. Ensure DNS records exist. Update ci-cd.md once wired.
[Low Priority] Remove --passWithNoTests from @busflow/api Test Script:
- Context: The apps/api/package.json test script uses jest --passWithNoTests as a temporary workaround to prevent CI failures while no .spec.ts files exist.
- Action: Once the first test files land in apps/api/src/, revert the script back to "test": "jest" so that accidental test-file deletions surface as CI failures.

3. Documentation Housekeeping

[Medium Priority] Investigate Documentation Reorganization Potential:
- Context: As the documentation has grown organically, there may be overlapping content, unclear categorization, or opportunities to consolidate related topics.
- Action: Audit the current docs/ structure for duplicated content, inconsistent categorization, and navigation pain points. Propose a revised information architecture if warranted.
[High Priority] Reduce Core Docs Folder Size:
- Context: The core docs/ folder may contain content that could be moved closer to the code it describes (e.g., into apps/*/docs/ or packages/*/docs/), archived, or consolidated.
- Action: Inventory all files in the root docs/ directory. Identify candidates for: (1) co-location with their respective app/package, (2) merging into other documents, (3) archiving outdated content. Goal is a drastically leaner core docs folder.
[Medium Priority] Placeholder User Journey Narratives:
- Context: 6 user journey spokes in 3-resources/user-journeys/ contain only scenario stubs and need full multi-step narratives: Concierge Onboarding, Email-as-API Ingestion, Omnichannel Inbox Triage, Collaborative Trip Planning, Zero-Downtime Legacy Migration, Trigger-Based Lifecycle Messaging. Each placeholder links to its source capability definition.
- Action: Flesh out as the corresponding features reach design maturity. See PRODUCT_user-journeys.md registry (Journeys 9–14). Origin: formerly SB-21, relocated from STRATEGIC_BLIND_SPOTS.md (2026-05-11).

4. AI Agent & Tooling Setup

[Medium Priority] Autonomous Documentation Issue Agent:
- Context: The project documentation requires continuous maintenance and review to prevent conflicting information and outdated content.
- Action: Add an autonomous AI agent configured to run daily to scan documentation for inconsistencies (e.g., conflicting information) and automatically create PRs to address identified issues.
[Medium Priority] Investigate & Decide on Story / Task Management Tool:
- Context: The monorepo currently lacks a formal in-repo mechanism for tracking stories, tasks, and sprint-level work items beyond this TODO file.
- Action: Research options for lightweight, repo-native task management (e.g., GitHub Projects, Linear integration, markdown-based tools like todo.md conventions, or issue-linked task files). Evaluate against criteria: developer friction, visibility, integration with CI/PR workflow, and suitability for a small founding team. Recommend and document the chosen approach.
[High Priority] Docs-Hub — Deploy Documentation Portal:
- Context: The docs-hub (studio/docs-hub) VitePress portal and context-engine (packages/context-engine) RAG server are implemented. The predecessor docs-assistant has been removed.
- Action: Deploy docs-hub (static VitePress) and context-engine to Hetzner Docker Swarm. Verify end-to-end AI chat, search, and role-based content visibility. → See docs-hub.md for the full specification.
[Medium Priority] Add GitHub CODEOWNERS File:
- Context: The monorepo contains sensitive infrastructure (terraform/, docker/, .github/workflows/, scripts/) alongside product code. Access control and review requirements differ between these areas.
- Action: Create a .github/CODEOWNERS file restricting approval of infrastructure and CI/CD changes to founders/ops. This provides the security boundary benefit of repo extraction without the multi-repo overhead.

5. Deferred Design Decisions (TourDeparture Refactoring)

[Medium Priority] Vehicle & Crew Assignment to TourDeparture:
- Context: The TourDeparture entity is the natural place to assign vehicles and crew to a scheduled departure. Currently no linking tables exist. Vehicle/crew assignment may need to be per-leg rather than per-departure (e.g., different drivers for different ServiceLegs).
- Action: Design the assignment model. Key questions: (1) Is assignment per-departure (tour_vehicle_assignments, tour_crew_assignments) or per-ServiceLeg (via LegAssignment in Operations)? (2) If both exist, which is "planned" vs. "actual"? (3) Does this overlap with Operations' existing LegAssignment entity? Coordinate with the Operations schema design.
[Low Priority] Allotment Linking to TourDeparture:
- Context: Allotment (reserved hotel rooms, ferry slots) currently links only to Supplier. The CostingSheet implicitly factors allotments via procurement_items[], but there is no explicit TourDeparture → Allotment link showing which inventory blocks are consumed by which departure.
- Action: Decide whether explicit linking is needed. Key questions: (1) Is the implicit link through CostingSheet.procurement_items[].allotment_id sufficient? (2) Does an explicit junction table (tour_departure_allotments) add value for capacity tracking (Backoffice Review, Strategic Blind Spot #4: "allotment consumption tracking")? (3) Could this be handled as a computed view instead?
[Low Priority] Departure-Specific Pickup Times + Boarding Order:
- Context: Both pickup times and boarding order are operational concerns that depend on the finalized passenger list and actual door pickup bookings — not product configuration. Defining a static boarding order at the library level implies a fixed route that ignores real-world booking patterns. The current data model has no place for either.
- Action: Design the pickup time + boarding order assignment flow as a [v0.2] drilldown. Likely requires a dispatch-side table or computation. Separate from the boarding point library design.

6. Industry Best Practices & Future Gaps (Phase 2+)

These items were identified during the Level 1-3 audits as important industry-standard capabilities that are not yet part of the core "Ready to Code" implementation.

[Medium Priority] Bank Statement Import & Auto-Matching [Phase 2]:
- Context: No mechanism for importing MT940/CAMT files to match payments to bookings. Moved to Phase 2 per PRODUCT_payments-and-billing.md.
- Action: Architect bank_imports and bank_transactions tables for automated bookkeeping when Phase 2 begins.
[High Priority] Partial Refund Flows (Mollie Sync):
- Context: Explicit mapping between partial booking cancellations and Mollie ledger refunds is missing.
[High Priority] Group Booking Modifications:
- Context: Securely tracking post-confirmation passenger additions and price adjustments.
[Medium Priority] Voucher & Gift Card Management:
- Context: Complex accounting for partial redemptions and refunds of value-based vouchers.
[Medium Priority] Booking Source Channel:
- Context: Dispatcher visibility into booking origin (Widget, API, Manual, Phone).
[Medium Priority] Automated Compliance Checks (VIES/Visa):
- Context: Real-time validation of EU VAT IDs and passenger visa requirements for manifests.
[Low Priority] Dynamic Booking Questions:
- Context: Configurable data collection (passport numbers, etc.) during the checkout flow.
[Medium Priority] Bus Hardware Integration & Telematics:
- Context: The current TelemetryPoint entity assumes GPS/speed/fuel data from hardware on the bus, but Phase 1 has no hardware integration. Telematics hardware (OBD-II dongles, GPS trackers, dashcams) could enable live ETA tracking ("Where is my Bus?"), fuel monitoring, digital tachograph data import (EU-561 compliance), and route deviation alerts. This is a significant differentiator for Pillar 3 but requires vendor evaluation, hardware provisioning logistics, and a dedicated real-time data ingestion pipeline (separate from the offline mutation sync protocol).
- Action: Research: (1) available telematics hardware for DACH bus fleets (e.g., Geotab, Samsara, TomTom Telematics, Webfleet), (2) whether to use a third-party telematics platform or build a lightweight ingestion API, (3) cost model (per-vehicle hardware + SaaS fee vs. one-time OBD-II dongle), (4) fallback option using the driver's phone GPS via PWA Geolocation API for Phase 1 ETA tracking. Define scope boundaries: what's achievable with phone-only GPS vs. what requires dedicated hardware.

7. Förderung — Forschungszulage (FZulG)

[High Priority] AP10–AP19 — Codebasis-Validierung & Plan-Adjustment für FZulG-Tauglichkeit:
- Context: Der AP-Katalog (funding-work-packages.md) listet AP10–AP19 als 🟡 Kandidaten — Domänenartefakt vorhanden, aber Codebasis-Status (vorhanden / Phase 2 / nur Konzept) und tatsächliche Implementierungsausprägung sind nicht verifiziert. Risikobewertung in bsfz-ablehnungsrisiken.md hängt teils an Implementierungs-Details (insb. AP18 CRDT vs. Last-Write-Wins).
- Action: Pro Kandidat-AP entscheiden: (a) AP entfällt mangels Codebasis-Substanz, (b) AP wird in einen anderen integriert (z.B. AP15 in AP3), (c) AP bleibt eigenständig. Bevorzugte Strategie: Wo der Implementierungs-Plan in den Docs (Domain Model, Protokolle, ADRs) noch flexibel ist, Plan an die FZulG-günstige Variante anpassen statt umgekehrt. Beispiel AP18: Wenn CRDT vs. Last-Write-Wins noch nicht entschieden, das Domain-Modell so schärfen, dass die CRDT-Variante als geplante Lösung dokumentiert wird (sofern technisch sinnvoll).
- Sonderfall AP18: Binäre Entscheidung CRDT (Yjs/Automerge) vs. Last-Write-Wins-via-Hasura. Bei letzterem: AP18 streichen — kein F&E-Anteil.
- Output: Status-Spalten in funding-work-packages.md aktualisieren (✅/🟡/⚪ → final), Risiko-Tabelle in bsfz-ablehnungsrisiken.md syncen, ggf. Vorhaben-Clustering in STRATEGY_public-funding.md anpassen.
- Reihenfolge: Vor der eigentlichen BSFZ-Antragsverfassung (Phase 2 in funding-application-plan.md, W. 5–14). Ohne Validierung droht Stundenüberschätzung im Antrag → Glaubwürdigkeitsrisiko beim Gutachter.

Busflow Docs

Documentation TODOs ​

1. Product & Domain Specifications ​

2. Technical & Architecture Details ​

3. Documentation Housekeeping ​

4. AI Agent & Tooling Setup ​

5. Deferred Design Decisions (TourDeparture Refactoring) ​

6. Industry Best Practices & Future Gaps (Phase 2+) ​

7. Förderung — Forschungszulage (FZulG) ​