ADR-034: Documentation Frontmatter Schema β
Date: 2026-05-06 Status: β Accepted Deciders: Julian BrΓΌning
Context β
The Busflow monorepo contains 170+ markdown files under docs/, all carrying YAML frontmatter enforced by CI (scripts/lint-docs.mjs, scripts/lint-frontmatter.sh). The existing schema evolved organically across several PRs but has no formal specification or ADR. Documentation exists only inline in docs/2-areas/process/guidelines.md.
Three concrete gaps drive this ADR:
- No
summaryfield. Hub pages (AREAS.md), tag index pages (docs/tags/), VitePress search results, and AI agent triage all lack concise document descriptions. Agents must read full documents to assess relevance β wasteful for both token budgets and response latency. - No code-to-docs linkage. Documentation describes implementation patterns, but no structured metadata connects a doc to the code paths it covers. Developers rely on
grepor memory. - No external source tracking. Authoritative references (laws, RFCs, vendor specs) appear as inline links but are invisible to machine discovery. Agents cannot systematically identify which external sources inform a document.
This ADR formalizes the existing schema and introduces three new optional fields (summary, code_refs, sources) to close these gaps.
Decision β
Full Schema β
| Field | Type | Required | Validation | Rationale |
|---|---|---|---|---|
para_category | Enum: project, area, resource, archive | Yes | Must match enum | PARA organizational classification. Determines folder placement and agent triage behavior. |
date_last_reviewed | ISO date (YYYY-MM-DD) | Yes | Must match ^\d{4}-\d{2}-\d{2}$ | Tracks human review, not last edit. Enables staleness detection (>6 months). |
tags | String array | Yes (β₯1) | Every tag must exist in docs/.tags.yml | Multi-axis classification (domain, tech, audience, doc-type, lifecycle). CI-enforced controlled vocabulary prevents sprawl. |
summary | String (inline scalar) | Recommended (mandatory after backfill sprint) | β€ 200 characters. Single-line only β no YAML block scalars (>-, |). | Hub page snippets, VitePress search results, agent triage. Agents read summary + tags before deciding whether to load the full document. |
related | String array | Optional (β€ 7) | Each path must resolve to an existing file in the repo | Directed graph edges to specific documents. Repo-root-relative paths. Inline comments explain why the relationship exists. |
code_refs | String array | Optional (β€ 10) | Must not start with / (no absolute paths), must not contain .. (no parent escapes). No filesystem existence check. | Advisory pointers connecting documentation to implementation code. Globs permitted. No existence check β code paths change faster than docs. |
sources | String array | Optional (β€ 5) | Must start with https:// or http:// | Authoritative external references (laws, RFCs, vendor specs). Not for "see also" links β those go inline in the document body. |
expires_on | ISO date | Optional (projects only) | Must match date format. Linter errors if date has passed. | Hard deadline for time-bound projects. |
summary Syntax Constraint β
summary uses single-line inline scalar syntax only:
summary: How busflow handles GDPR-compliant data storage, logging and deletion.Block scalar syntax (>-, |) is not supported. Rationale: the two hand-rolled YAML parsers in scripts/lint-docs.mjs and docs/tags/[tag].paths.ts handle inline scalars and arrays but do not support YAML block scalars. 200 characters fit comfortably on one line.
ADR-Specific Rules β
ADRs are immutable historical records. The following rules govern which frontmatter fields ADRs may carry:
| Field | Permitted on ADRs? | Rationale |
|---|---|---|
tags | β Yes | Classification metadata β does not alter the decision record |
summary | β Yes | Metadata describing what the ADR covers β does not alter the decision |
related | β No | Curated edges count as content modification, violating immutability |
code_refs | β No | Code references evolve post-decision β adding them later modifies the ADR's scope |
sources | Inherited | ADRs reference external sources in their body text at creation time. Adding sources: after the fact modifies the record. Pre-existing sources: from creation are fine. |
Tag Vocabulary β
Tags come from a controlled vocabulary in docs/.tags.yml, organized in five axes:
| Axis | Purpose | Example values |
|---|---|---|
domain | Business domain | auth, payments, gdpr, booking, fleet |
tech | Technology stack | hasura, vue, postgres, docker, terraform |
audience | Target reader | dev, ops, pm, support |
doc-type | Document format/purpose | explanation, reference, runbook, decision, tutorial, specification |
lifecycle | Document maturity | draft, active, deprecated |
Tags are flat in frontmatter (tags: [gdpr, dev, explanation]). Axis grouping exists only in .tags.yml for human readability β the linter flattens all values into one allowed set. Agents must read .tags.yml to semantically interpret which axis a tag belongs to.
Enforcement β
| Validator | Scope | Runs |
|---|---|---|
scripts/lint-frontmatter.sh | para_category, date_last_reviewed | Pre-commit hook (staged files only) |
scripts/lint-docs.mjs | All fields + tag vocabulary + related paths + link validation | CI and pnpm docs:lint |
summary Enforcement Rollout β
- Phase 1 (this ADR):
summaryis optional. Linter emits a warning if missing. - Phase 2 (after backfill sprint): Add
--require-summaryflag topnpm docs:lintinpackage.json. Linter emits an error if missing.
What We Excluded and Why β
| Excluded Field | Rationale |
|---|---|
title | The H1 heading in the document body is the single source of truth. VitePress extracts it at build time. Duplicating it in frontmatter creates a maintenance burden and drift risk. |
author / owner | git blame is authoritative. A frontmatter field would drift the moment someone else edits the file. |
version | Documentation is always "latest." ADRs are versioned by their creation date. |
purpose (as own field) | The doc-type tag axis (explanation, reference, runbook, decision, etc.) already classifies document format/purpose. A separate purpose field would duplicate this axis. |
audience (as own field) | The audience tag axis (dev, ops, pm, support) already classifies target readers. A separate field creates double-maintenance β authors write tags: [dev] AND audience: [dev]. |
lifecycle (as own field) | The lifecycle tag axis (draft, active, deprecated) already classifies maturity. Same double-maintenance argument. |
domain (as own field) | Overlaps with tags. Domain classification is a tag axis, not a structured field. |
pitfall (as doc-type) | No concrete use-case exists yet. Adding vocabulary speculatively leads to sprawl. Revisit when a pitfall document actually needs classification. |
playbook (as doc-type) | No process-style playbooks (onboarding, release ceremony) exist in the repo. runbooks/ contains incident-response procedures; protocols/ contains system specifications tagged as specification. Add playbook when the first process document appears. |
Consequences β
Positive β
- Agent triage efficiency β agents read
summary+tagsbefore deciding whether to load the full document, reducing token usage and response latency. - Hub page quality β tag index pages and AREAS.md display concise summaries instead of bare titles.
- Code-docs traceability β
code_refsconnects documentation to implementation without requiring exact path maintenance. - Machine-discoverable external sources β
sourcesmakes authoritative references parseable without reading document body text. - Full schema documentation β this ADR serves as the canonical reference, replacing scattered inline documentation.
Negative β
- Summary maintenance burden β 170+ files need summaries. Sprint backfill produces them; ongoing maintenance requires updating summaries when document scope changes.
- Single-line constraint β
summarycannot use YAML block scalars. Long summaries require careful editing to stay under 200 characters on one line. This is a deliberate trade-off against parser complexity. - Advisory
code_refsβ no filesystem existence check means stale paths go undetected. This is intentional (code paths rename faster than docs update), but reduces the field's reliability over time.