Busflow Docs

Internal documentation portal

Skip to content

Session 4 โ€” Infrastructure & Operations โ€‹

Goal: Confirm infrastructure, deployment, observability, and operational runbooks are production-ready. Estimated time: 60 min

Reading Order โ€‹

Core Architecture โ€‹

#FileWhat It Covers
1infrastructure.mdFull deployment architecture: Terraform, Docker Swarm, backup/DR, Ubicloud cutover.
2observability.mdLGTM stack, instrumentation, PII redaction, SLIs/SLOs, volume monitoring.
3infrastructure/README.mdDirectory layout for docker/, terraform/, scripts/.
4ssh-access-guide.mdServer access instructions.

CI/CD & Alerting โ€‹

#FileWhat It Covers
5ci-cd.mdFull CI/CD pipeline spec: Terraform, app deploys, security tooling, quality gates, preview envs.
6observability-alerts.mdAlert definitions and triage procedures.

Runbooks โ€‹

#FileWhat It Covers
7backup-verify-runbook.mdBackup integrity verification procedure.
8secrets-rotation-runbook.mdDocker Swarm secret rotation procedure.
9manager-failover-runbook.mdSwarm manager node failover procedure.
10postgres-cutover.mdUbicloud PostgreSQL migration procedure.
11legal-hold-runbook.mdGDPR legal hold procedure.
12observability-terraform-migration.mdObservability stack Terraform migration.

๐Ÿ” What to Validate โ€‹

  • [ ] infrastructure.md has a duplicated line about "Edge Routing" (lines 20โ€“21). Confirm and fix.
  • [ ] The Ubicloud cutover section is extremely detailed โ€” is this still the plan, or has anything changed?
  • [ ] observability.md references promtail in the preview section but the Loki Docker logging driver in the retention section โ€” which is actually deployed?
  • [ ] Runbooks: are they executable as written? Could you follow each one step-by-step during an incident?
  • [ ] ci-cd.md ยง3 lists several security mitigations as "โš ๏ธ TODO" โ€” are any resolved?
  • [ ] Cross-reference the 14 GitHub Actions workflows in .github/workflows/ against what ci-cd.md describes โ€” any drift?
  • [ ] observability.md SLO definitions โ€” are these still the right targets?
  • [ ] Production container resource limits: infrastructure.md ยง5 flags this as mandatory but TODOS.md says none are set. Still true?

GitHub Workflows to Cross-Check โ€‹

.github/workflows/
โ”œโ”€โ”€ _build-images.yml
โ”œโ”€โ”€ backup-verify.yml
โ”œโ”€โ”€ ci.yml
โ”œโ”€โ”€ deploy-infra.yml
โ”œโ”€โ”€ deploy-observability.yml
โ”œโ”€โ”€ deploy-studio.yml
โ”œโ”€โ”€ deploy.yml
โ”œโ”€โ”€ docs-lint.yml
โ”œโ”€โ”€ index-docs.yml
โ”œโ”€โ”€ preview-lifecycle.yml
โ”œโ”€โ”€ preview.yml
โ”œโ”€โ”€ release.yml
โ”œโ”€โ”€ secrets-sync.yml
โ””โ”€โ”€ terraform.yml

Findings โ€‹

Internal documentation โ€” Busflow