Busflow Docs

Internal documentation portal

Skip to content

CI/CD

A robust CI/CD pipeline is the only way to keep a self-hosted Docker Swarm architecture from becoming an operational nightmare. By automating your deployments and infrastructure changes, you remove the human error of SSHing into servers to manually pull images or update routing rules.

1. IaaS CI/CD Pipeline (Terraform)

Your physical and managed infrastructure should be treated as code. You will use a dedicated GitHub repository (or a heavily isolated folder in your monorepo) for your Terraform configurations.

Setup Requirements:

  • Remote State Storage: Remote S3-compatible backend (Hetzner Object Storage) for state management with CI/CD support. Requires S3_ACCESS_KEY and S3_SECRET_KEY in GitHub Secrets.
  • GitHub Secrets: You need to securely store your HETZNER_API_TOKEN in GitHub.

Planned Automations:

  • On Pull Request: GitHub Actions runs terraform fmt (formatting), terraform validate (syntax checking), and terraform plan. The action comments the execution plan directly on the PR, showing exactly what servers or firewalls will be created, modified, or destroyed.
  • On Merge to Main: GitHub Actions runs terraform apply -auto-approve, executing the changes on Hetzner and AWS instantly.

Continuous Manual Tasks:

  • Reviewing the Plan: A human must always review the terraform plan output on the PR before merging. Terraform is powerful; a typo could accidentally instruct it to delete your production managed database.
  • Major Architecture Shifts: Moving from Hetzner to AWS, or adding a completely new managed service cluster, will require manual research and writing new Terraform modules.

2. Application CI/CD Pipeline (GitHub Actions + Docker Swarm)

This pipeline handles your code (Vue.js, Nest.js), your GraphQL engine configurations (Hasura), and the routing (Traefik).

Setup Requirements:

  • Container Registry: You will use GitHub Container Registry (GHCR) to store your compiled Docker images.
  • Swarm Access: You must generate a dedicated SSH key pair. The private key goes into GitHub Secrets, allowing the GitHub Actions runner to securely SSH into your Hetzner Swarm Manager node to execute deploy commands.
  • Hasura CLI: Your pipeline needs the Hasura CLI installed to automate database migrations and metadata syncing.

Planned Automations (Per Service):

ServiceCI/CD Automation Flow
Vue.js (Frontend)PR: Run ESLint, run unit tests. Build test image.
Merge: Build production static files, bake them into an Nginx Docker image. Push to GHCR with the Git SHA as the tag. Update the Swarm service (docker service update --image ghcr.io/org/vue-app:SHA frontend).
Nest.js (API & Worker)PR: Run ESLint, Jest unit tests, and e2e tests.
Merge: Build the Node.js Docker image. Push to GHCR. Update both the api and api-worker Swarm services sequentially to ensure zero downtime.
Hasura & PostgresPR: Spin up a temporary Postgres/Hasura container in the GitHub runner, apply migrations, and run schema tests.
Merge: Before updating the frontend or backend, the action uses the Hasura CLI to apply SQL migrations to the Hetzner Managed Postgres and applies the Hasura Metadata (roles, permissions, event triggers).
Nhost (Auth/Storage)These are pre-built images. Your automation only triggers if you change their configuration files or bump the version tag in your docker-compose.yml.
Preview EnvironmentsPR Opened: Action builds images, SSHs into the Swarm Manager, and runs docker stack deploy -c preview.yml pr-123. Traefik dynamically routes a URL (e.g., pr-123.preview.yoursaas.com). Action posts the link to the PR.
PR Closed: Action runs docker stack rm pr-123 to free up Hetzner resources.

Continuous Manual Tasks:

  • Secret Rotation: Because Swarm secrets are immutable, if an external API key (like OpenAI or AWS SES) is compromised, you must manually create a new secret version in the Swarm, update your docker-compose.yml to point to the new version, and trigger a deployment.
  • Destructive Database Migrations: While Hasura handles standard migrations beautifully via CI/CD, complex destructive changes (e.g., dropping a widely used column or splitting a massive table) should often be run manually during a planned maintenance window to prevent table-locking issues under high load.
  • Node Maintenance: If a Hetzner server needs a hardware replacement, you will manually SSH in, run docker node update --availability drain <node-id> to safely move containers to other servers, and then replace the node via Terraform.

Hasura Metadata Drift Detection

deploy.yml applies Hasura metadata forward-only via hasura metadata apply, which silently overwrites any manual console changes. Without a drift check, a developer who modifies metadata via hasura console (SSH tunnel) and forgets to export it loses those changes on the next deploy — or worse, causes schema conflicts.

  • Post-deploy verification: after hasura metadata apply succeeds, a follow-up step runs hasura metadata diff and hasura migrate status against the repo state. Any un-exported console change surfaces as a CI warning (not a hard failure — console use during debugging is legitimate, but the diff must be acknowledged).
  • Weekly cron: a scheduled variant runs the same diff check between deploys to catch drift that accumulates while no deploys occur.
  • Implementation: tracked in TODOS.md.

3. Security Tooling (GitHub Actions)

Automated security scanning is integrated directly into the CI/CD pipeline and the GitHub Pull Request process.

ToolPurposeTrigger
GitHub Advanced Security (CodeQL)Static Application Security Testing (SAST). Scans for security vulnerabilities, injection flaws, and logic errors in TypeScript/JavaScript.On every PR and weekly cron.
DependabotAutomated dependency updates. Opens PRs when new versions of npm packages are available. Flags known CVEs in transitive dependencies.Continuous monitoring.
SnykDeep dependency vulnerability scanning. Provides fix recommendations and license compliance checks. Used alongside Dependabot for layered coverage.On every PR and continuous monitoring.
license-checkerScans all npm dependency licenses against a configured allowlist. Fails the build if a dependency uses a copyleft or unknown license (e.g., GPL, AGPL). Generates a full license report for legal review.On every PR.

License Allowlist Configuration: The license-checker step is configured with --onlyAllow to explicitly permit a set of known-safe licenses: MIT; ISC; BSD-2-Clause; BSD-3-Clause; Apache-2.0; 0BSD; CC0-1.0. Any dependency that falls outside this allowlist causes the pipeline to fail, requiring manual review before merge. The generated license summary is uploaded as a build artifact for auditing.

CI/CD Secret Security

Threat: GitHub Actions secrets (Hetzner API token, SSH deploy keys, database passwords) are available to workflows triggered by PRs from same-repo branches. A collaborator with push access can modify a workflow file in their branch to exfiltrate secrets — for example by encoding the value (base64, hex) to bypass GitHub's automatic log masking, then sending it to an external server.

GitHub's Built-In Protections:

  • Log masking — GitHub replaces literal secret values with *** in workflow logs. This stops accidental exposure but does NOT stop intentional encoding-based exfiltration.
  • Fork PR isolation — workflows triggered by PRs from forks do NOT receive repository secrets by default. This only protects against external contributors, not same-repo collaborators.

Required Mitigations (Mandatory):

MitigationStatusHow
Environment protection gates⚠️ TODOCreate production, studio, and preview environments in GitHub → Settings → Environments. Enable "Required reviewers" on each. Deploy jobs reference these environments — secrets only inject after manual approval.
Preview build/deploy separation⚠️ TODOSplit preview.yml so the build job (uses only GITHUB_TOKEN for GHCR push) runs automatically, but the deploy job (uses DEPLOY_SSH_KEY) requires environment approval.
Least-privilege SSH keys⚠️ TODOThe DEPLOY_SSH_KEY should only authorize docker stack deploy commands, not full root access. Create a dedicated deploy user on each Swarm node with a restricted shell or sudoers entry.
Audit collaborator accessOngoingLimit repository write access to trusted team members. Review the collaborator list before onboarding external contributors.

If External Contributors Are Added (Future):

  • Switch PR-triggered workflows from pull_request to pull_request_target to prevent fork PRs from executing modified workflow files.
  • Require first-time contributor approval for all workflow runs.

4. Scheduled E2E Testing (Cron)

In addition to post-deployment E2E runs, Playwright tests are executed on a cron schedule via GitHub Actions against the staging environment. This catches issues that post-deployment runs cannot: environment drift, expired credentials, time-dependent bugs, and external dependency failures.

  • Nightly: Critical-path E2E suite (core booking flow, login, dispatch).
  • Weekly: Full E2E suite covering all user journeys.
  • Alerting: Failures post to a dedicated Slack channel for immediate triage.

5. Monorepo-Aware Pipeline (Turborepo)

All CI tasks are executed through Turborepo to leverage caching and affected-package filtering. See monorepo.md for pipeline definitions and caching strategy.

Key CI Behaviors:

  • Affected-only execution: PR pipelines use turbo run build test lint --filter=...[origin/main] to skip unchanged packages, keeping feedback times under 10 minutes.
  • Remote caching: Turborepo remote cache is shared between CI runners, so repeated builds of unchanged packages are instant cache hits.
  • Path-filter matrix: GitHub Actions uses dorny/paths-filter to conditionally trigger expensive jobs (e.g., E2E tests only when app code changes, not just docs or configs).

Deployment Observability:

  • Deploy markers: Each deployment annotates Grafana dashboards with the Git SHA and deploy timestamp for visual correlation of performance changes.
  • Post-deploy health checks: GitHub Actions waits for OTel metrics (error rate, p99 latency) to stabilize within SLO thresholds before marking the deployment as successful.
  • Automatic rollback: If health checks fail within 5 minutes post-deploy, the previous Docker image tag is re-deployed automatically.

6. Quality Gates

Formalized blocking criteria that prevent merges and deployments. GitHub branch protection enforces all gates as required checks.

Merge Gates (Pull Request)

GateBlocking CriteriaEnforcement
Lint & FormatZero errors from Biome and ESLint (Vue + boundary rules)turbo run lint
Type SafetyZero errors from vue-tsc across all packagesturbo run typecheck
Unit TestsAll pass; no coverage decrease on diffturbo run test with coverage
Integration TestsAll pass (ephemeral Testcontainers)Parallel CI job
E2E SmokeCritical-path Playwright subset passes against preview envPost-build CI job
Security ScanZero critical/high CVEs (CodeQL + Snyk); license allowlist passContinuous + PR trigger
BuildProduction bundle builds without errors for affected packagesturbo run build --filter=...[origin/main]
Bundle SizeTotal bundle size within configured thresholdbundlesize (planned)

Deploy Gates (Pre-Production)

GateBlocking CriteriaEnforcement
Full E2E SuiteAll Playwright specs pass against stagingDeployment pipeline
Health CheckOTel metrics (error rate, p99 latency) within SLO thresholds for 5 min post-deployAutomated rollback trigger

Coverage Targets

While chasing 100% coverage often becomes an anti-pattern, we maintain baseline expectations:

  • Business Logic & Shared Packages (packages/*): High strictness. 80%+ coverage for core utilities, business logic, and UI components.
  • Apps (apps/*): Focus on E2E critical paths and unit testing complex business logic (Stores, Composables).
  • Coverage ratchet: Coverage percentage on a PR diff must not decrease — enforced via Codecov / Coveralls PR check.

Required Checks Matrix

Maps each gate to its GitHub branch protection configuration:

GitHub Check NameRequiredContext
lintBiome + ESLint
typecheckvue-tsc
testVitest unit + integration
buildProduction bundle
e2e-smokePlaywright critical paths
security / codeqlSAST
security / snykDependency CVEs
security / license-checkLicense allowlist
coverage⚠️ AdvisoryNo decrease on diff
bundle-size⚠️ Advisory (planned)Threshold check

7. Preview Deployments

Purpose

Provide a live, isolated environment per Pull Request for visual QA and stakeholder review before merging to main.

Prerequisites

  • Docker registry access (GitHub Container Registry)
  • Traefik dynamic routing configuration on the Swarm cluster
  • DNS wildcard entry for *.preview.busflow.de
  • Swarm node availability with defined resource limits
  • Secrets injection for preview environments (managed via GitHub Actions / Docker Secrets)
  • Database seeds for bootstrapping isolated preview databases

How It Works

  1. Trigger: A developer opens or updates a PR against main. This triggers the GitHub Actions preview workflow.
  2. Build: GitHub Actions builds container images for all affected services and pushes them to the registry.
  3. Deploy: The pipeline deploys a scoped Docker Swarm stack. Traefik dynamic routing maps pr-<number>.preview.busflow.de to the stack.
  4. Database Strategy:
    • Frontend-only changes (no Hasura migrations detected): The preview environment connects to the shared staging database to save resources.
    • Changes containing migrations: The pipeline provisions and seeds an isolated PostgreSQL instance for the preview, preventing migration conflicts with other environments.

Environment Lifecycle

  • Created on PR open or first push.
  • Updated on subsequent commits to the PR branch (rolling redeployment of the preview stack).
  • Torn down automatically on PR close or merge. A configurable TTL (e.g., 24 hours) acts as a safety net for orphaned environments.

Limitations & Open Questions

  • Resource limits: Define maximum concurrent preview environments and per-environment resource caps on the Hetzner cluster.
  • Secret management: Determine whether preview envs use production-equivalent secrets or a dedicated preview secret set.
  • Seed data freshness: Define a strategy for keeping database seeds up to date (e.g., nightly snapshots from staging, version-controlled seed scripts).
  • External service dependencies: Clarify how third-party integrations (payment providers, email services) behave in preview environments (sandbox mode, mocks, or disabled).

Internal documentation — Busflow