Session 4 โ Infrastructure & Operations โ
Goal: Confirm infrastructure, deployment, observability, and operational runbooks are production-ready. Estimated time: 60 min
Reading Order โ
Core Architecture โ
| # | File | What It Covers |
|---|---|---|
| 1 | infrastructure.md | Full deployment architecture: Terraform, Docker Swarm, backup/DR, Ubicloud cutover. |
| 2 | observability.md | LGTM stack, instrumentation, PII redaction, SLIs/SLOs, volume monitoring. |
| 3 | infrastructure/README.md | Directory layout for docker/, terraform/, scripts/. |
| 4 | ssh-access-guide.md | Server access instructions. |
CI/CD & Alerting โ
| # | File | What It Covers |
|---|---|---|
| 5 | ci-cd.md | Full CI/CD pipeline spec: Terraform, app deploys, security tooling, quality gates, preview envs. |
| 6 | observability-alerts.md | Alert definitions and triage procedures. |
Runbooks โ
| # | File | What It Covers |
|---|---|---|
| 7 | backup-verify-runbook.md | Backup integrity verification procedure. |
| 8 | secrets-rotation-runbook.md | Docker Swarm secret rotation procedure. |
| 9 | manager-failover-runbook.md | Swarm manager node failover procedure. |
| 10 | postgres-cutover.md | Ubicloud PostgreSQL migration procedure. |
| 11 | legal-hold-runbook.md | GDPR legal hold procedure. |
| 12 | observability-terraform-migration.md | Observability stack Terraform migration. |
๐ What to Validate โ
- [ ]
infrastructure.mdhas a duplicated line about "Edge Routing" (lines 20โ21). Confirm and fix. - [ ] The Ubicloud cutover section is extremely detailed โ is this still the plan, or has anything changed?
- [ ]
observability.mdreferencespromtailin the preview section but theLoki Docker logging driverin the retention section โ which is actually deployed? - [ ] Runbooks: are they executable as written? Could you follow each one step-by-step during an incident?
- [ ]
ci-cd.mdยง3 lists several security mitigations as "โ ๏ธ TODO" โ are any resolved? - [ ] Cross-reference the 14 GitHub Actions workflows in
.github/workflows/against whatci-cd.mddescribes โ any drift? - [ ]
observability.mdSLO definitions โ are these still the right targets? - [ ] Production container resource limits:
infrastructure.mdยง5 flags this as mandatory but TODOS.md says none are set. Still true?
GitHub Workflows to Cross-Check โ
.github/workflows/
โโโ _build-images.yml
โโโ backup-verify.yml
โโโ ci.yml
โโโ deploy-infra.yml
โโโ deploy-observability.yml
โโโ deploy-studio.yml
โโโ deploy.yml
โโโ docs-lint.yml
โโโ index-docs.yml
โโโ preview-lifecycle.yml
โโโ preview.yml
โโโ release.yml
โโโ secrets-sync.yml
โโโ terraform.yml