Busflow Docs

Internal documentation portal

Skip to content
Reviewed 04 May 2026

< Prev | Up: Session Overview | Next >

Session 4 β€” Infrastructure & Operations ​

Goal: Confirm infrastructure, deployment, observability, and operational runbooks are production-ready. Estimated time: 60 min

Reading Order ​

Core Architecture ​

#FileLinesWhat It Covers
1infrastructure.md179Full deployment architecture: Terraform, Docker Swarm, backup/DR.
2observability.md300LGTM stack, instrumentation, PII redaction, SLIs/SLOs.
3infrastructure/README.md~60Directory layout for docker/, terraform/, scripts/.
4ssh-access-guide.md~40Server access instructions.

CI/CD & Alerting ​

#FileLinesWhat It Covers
5ci-cd.md213Full CI/CD pipeline spec, security tooling, quality gates.
6observability-alerts.md143Alert definitions and triage procedures.

Runbooks ​

#FileLinesWhat It Covers
7backup-verify-runbook.md~50Backup integrity verification procedure.
8secrets-rotation-runbook.md~60Docker Swarm secret rotation procedure.
9manager-failover-runbook.md~80Swarm manager node failover procedure.
10postgres-cutover.md129Ubicloud PostgreSQL migration procedure.
11legal-hold-runbook.md~50GDPR legal hold procedure.
12observability-terraform-migration.md~100Observability stack Terraform migration.

πŸ” What to Validate ​

  • [ ] infrastructure.md has a duplicated line about "Edge Routing" (lines 20–21). Confirm and fix.
  • [ ] The Ubicloud cutover section is extremely detailed β€” is this still the plan, or has anything changed?
  • [ ] observability.md references promtail in the preview section but the Loki Docker logging driver in the retention section β€” which is actually deployed?
  • [ ] Runbooks: are they executable as written? Could you follow each one step-by-step during a 3 AM incident?
  • [ ] ci-cd.md Β§3 lists several security mitigations as "⚠️ TODO" β€” are any resolved?
  • [ ] observability.md SLO definitions β€” are these still the right targets?
  • [ ] Production container resource limits: infrastructure.md Β§5 flags this as mandatory but TODOS.md says none are set. Still true?

πŸ—ΊοΈ Mindmap & Path Optimization ​

Grab your pen and paper:

  • [ ] Map the Incident Response Path: A Grafana alert fires for "High API Latency". Trace the documentation path.
    • Observation: Where do you go first? observability-alerts.md β†’ observability.md β†’ infrastructure.md?
    • Optimization: The incident response flow is highly fragmented across 6 runbooks and 2 reference docs. Can we create a single "Incident Root Diagram" that points directly to the right runbook?
  • [ ] Map the Deployment Pipeline: How does code go from a PR to a live preview environment?
    • Observation: ci-cd.md is 213 lines long.
    • Optimization: Would a visual GitHub Actions dependency diagram save newcomers hours of reading?

πŸ“ Findings & Actions ​

SeverityFile / TopicIssue & Optimization PotentialAction Required

< Prev | Up: Session Overview | Next >

Internal documentation β€” Busflow