Busflow Docs

Internal documentation portal

Skip to content
πŸ“¦ Resource Reviewed 06 May 2026

ADR-034: Documentation Frontmatter Schema ​

Date: 2026-05-06 Status: βœ… Accepted Deciders: Julian BrΓΌning

Context ​

The Busflow monorepo contains 170+ markdown files under docs/, all carrying YAML frontmatter enforced by CI (scripts/lint-docs.mjs, scripts/lint-frontmatter.sh). The existing schema evolved organically across several PRs but has no formal specification or ADR. Documentation exists only inline in docs/2-areas/process/guidelines.md.

Three concrete gaps drive this ADR:

  1. No summary field. Hub pages (AREAS.md), tag index pages (docs/tags/), VitePress search results, and AI agent triage all lack concise document descriptions. Agents must read full documents to assess relevance β€” wasteful for both token budgets and response latency.
  2. No code-to-docs linkage. Documentation describes implementation patterns, but no structured metadata connects a doc to the code paths it covers. Developers rely on grep or memory.
  3. No external source tracking. Authoritative references (laws, RFCs, vendor specs) appear as inline links but are invisible to machine discovery. Agents cannot systematically identify which external sources inform a document.

This ADR formalizes the existing schema and introduces three new optional fields (summary, code_refs, sources) to close these gaps.

Decision ​

Full Schema ​

FieldTypeRequiredValidationRationale
para_categoryEnum: project, area, resource, archiveYesMust match enumPARA organizational classification. Determines folder placement and agent triage behavior.
date_last_reviewedISO date (YYYY-MM-DD)YesMust match ^\d{4}-\d{2}-\d{2}$Tracks human review, not last edit. Enables staleness detection (>6 months).
tagsString arrayYes (β‰₯1)Every tag must exist in docs/.tags.ymlMulti-axis classification (domain, tech, audience, doc-type, lifecycle). CI-enforced controlled vocabulary prevents sprawl.
summaryString (inline scalar)Recommended (mandatory after backfill sprint)≀ 200 characters. Single-line only β€” no YAML block scalars (>-, |).Hub page snippets, VitePress search results, agent triage. Agents read summary + tags before deciding whether to load the full document.
relatedString arrayOptional (≀ 7)Each path must resolve to an existing file in the repoDirected graph edges to specific documents. Repo-root-relative paths. Inline comments explain why the relationship exists.
code_refsString arrayOptional (≀ 10)Must not start with / (no absolute paths), must not contain .. (no parent escapes). No filesystem existence check.Advisory pointers connecting documentation to implementation code. Globs permitted. No existence check β€” code paths change faster than docs.
sourcesString arrayOptional (≀ 5)Must start with https:// or http://Authoritative external references (laws, RFCs, vendor specs). Not for "see also" links β€” those go inline in the document body.
expires_onISO dateOptional (projects only)Must match date format. Linter errors if date has passed.Hard deadline for time-bound projects.

summary Syntax Constraint ​

summary uses single-line inline scalar syntax only:

yaml
summary: How busflow handles GDPR-compliant data storage, logging and deletion.

Block scalar syntax (>-, |) is not supported. Rationale: the two hand-rolled YAML parsers in scripts/lint-docs.mjs and docs/tags/[tag].paths.ts handle inline scalars and arrays but do not support YAML block scalars. 200 characters fit comfortably on one line.

ADR-Specific Rules ​

ADRs are immutable historical records. The following rules govern which frontmatter fields ADRs may carry:

FieldPermitted on ADRs?Rationale
tagsβœ… YesClassification metadata β€” does not alter the decision record
summaryβœ… YesMetadata describing what the ADR covers β€” does not alter the decision
related❌ NoCurated edges count as content modification, violating immutability
code_refs❌ NoCode references evolve post-decision β€” adding them later modifies the ADR's scope
sourcesInheritedADRs reference external sources in their body text at creation time. Adding sources: after the fact modifies the record. Pre-existing sources: from creation are fine.

Tag Vocabulary ​

Tags come from a controlled vocabulary in docs/.tags.yml, organized in five axes:

AxisPurposeExample values
domainBusiness domainauth, payments, gdpr, booking, fleet
techTechnology stackhasura, vue, postgres, docker, terraform
audienceTarget readerdev, ops, pm, support
doc-typeDocument format/purposeexplanation, reference, runbook, decision, tutorial, specification
lifecycleDocument maturitydraft, active, deprecated

Tags are flat in frontmatter (tags: [gdpr, dev, explanation]). Axis grouping exists only in .tags.yml for human readability β€” the linter flattens all values into one allowed set. Agents must read .tags.yml to semantically interpret which axis a tag belongs to.

Enforcement ​

ValidatorScopeRuns
scripts/lint-frontmatter.shpara_category, date_last_reviewedPre-commit hook (staged files only)
scripts/lint-docs.mjsAll fields + tag vocabulary + related paths + link validationCI and pnpm docs:lint

summary Enforcement Rollout ​

  1. Phase 1 (this ADR): summary is optional. Linter emits a warning if missing.
  2. Phase 2 (after backfill sprint): Add --require-summary flag to pnpm docs:lint in package.json. Linter emits an error if missing.

What We Excluded and Why ​

Excluded FieldRationale
titleThe H1 heading in the document body is the single source of truth. VitePress extracts it at build time. Duplicating it in frontmatter creates a maintenance burden and drift risk.
author / ownergit blame is authoritative. A frontmatter field would drift the moment someone else edits the file.
versionDocumentation is always "latest." ADRs are versioned by their creation date.
purpose (as own field)The doc-type tag axis (explanation, reference, runbook, decision, etc.) already classifies document format/purpose. A separate purpose field would duplicate this axis.
audience (as own field)The audience tag axis (dev, ops, pm, support) already classifies target readers. A separate field creates double-maintenance β€” authors write tags: [dev] AND audience: [dev].
lifecycle (as own field)The lifecycle tag axis (draft, active, deprecated) already classifies maturity. Same double-maintenance argument.
domain (as own field)Overlaps with tags. Domain classification is a tag axis, not a structured field.
pitfall (as doc-type)No concrete use-case exists yet. Adding vocabulary speculatively leads to sprawl. Revisit when a pitfall document actually needs classification.
playbook (as doc-type)No process-style playbooks (onboarding, release ceremony) exist in the repo. runbooks/ contains incident-response procedures; protocols/ contains system specifications tagged as specification. Add playbook when the first process document appears.

Consequences ​

Positive ​

  • Agent triage efficiency β€” agents read summary + tags before deciding whether to load the full document, reducing token usage and response latency.
  • Hub page quality β€” tag index pages and AREAS.md display concise summaries instead of bare titles.
  • Code-docs traceability β€” code_refs connects documentation to implementation without requiring exact path maintenance.
  • Machine-discoverable external sources β€” sources makes authoritative references parseable without reading document body text.
  • Full schema documentation β€” this ADR serves as the canonical reference, replacing scattered inline documentation.

Negative ​

  • Summary maintenance burden β€” 170+ files need summaries. Sprint backfill produces them; ongoing maintenance requires updating summaries when document scope changes.
  • Single-line constraint β€” summary cannot use YAML block scalars. Long summaries require careful editing to stay under 200 characters on one line. This is a deliberate trade-off against parser complexity.
  • Advisory code_refs β€” no filesystem existence check means stale paths go undetected. This is intentional (code paths rename faster than docs update), but reduces the field's reliability over time.

Internal documentation β€” Busflow