Busflow Docs

Internal documentation portal

Skip to content

Agentic-Led Company β€” Governance & Control Spec Sheet ​

Working title: Agentic-Led Company Author: Julian BrΓΌning Β· Date: 2026-04-17 Status: ACCEPTED β€” Implementation via Paperclip + MCP (see Β§10)


1. Purpose ​

Define the governance model for a solo-founder SaaS company (Busflow) where AI agents perform work across all departments. The model must guarantee:

  1. AI self-governance β€” agents catch their own mistakes before output reaches the founder
  2. Founder sovereignty β€” every decision and output is visible, scannable, and overridable
  3. Context efficiency β€” agents operate with minimal, scoped context to reduce cost and hallucination risk
  4. Proactive task surfacing β€” agents don't just respond; they identify and propose work

2. Core Principles ​

#PrincipleDescription
P1No blind delegationEvery agent output must pass through at least one review layer before it becomes a decision
P2Scannable by defaultAll outputs follow a standardized hierarchy: Summary β†’ Decisions β†’ Details
P3TraceabilityEvery output traces back to the input that triggered it and the reasoning chain used
P4Hallucination visibilitySupervisor corrections are always surfaced, never silently merged
P5Founder is final authorityNo high-impact action without explicit founder approval (see Β§7 Blast Radius)
P6Least-privilege contextEach agent receives only the context its bounded context permits
P7Token economyEvery review layer must justify its cost; prefer lightweight checks over full re-analysis

3. Event-Driven Agent Architecture ​

3.1 Design Overview ​

The architecture organizes agents as bounded contexts (mirroring DDD) that communicate exclusively through a central Event Bus. No agent directly calls another agent. The Orchestrator routes events and the Supervisor reviews outputs β€” these may be the same or separate roles (see Β§3.4).

Agent teams map directly to BusFlow's four DDD bounded contexts (pillars), ensuring each team operates with deep, scoped knowledge of its domain rather than shallow, broad knowledge of the entire codebase.

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚     FOUNDER (You)      β”‚
                    β”‚  Approve Β· Override    β”‚
                    β”‚  Task Board (Β§4)       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ Management Reports
                               β”‚ Task proposals
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   CEO / CTO AGENTS     β”‚
                    β”‚   Routes events         β”‚
                    β”‚   Reviews outputs       β”‚
                    β”‚   Enforces standards    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ Domain Events
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚             β”‚               β”‚             β”‚
   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ COMMERCE   β”‚ β”‚ BACKOFFICE β”‚ β”‚ OPERATIONS β”‚ β”‚ COMMS        β”‚
   β”‚ TEAM       β”‚ β”‚ TEAM       β”‚ β”‚ TEAM       β”‚ β”‚ TEAM         β”‚
   β”‚            β”‚ β”‚            β”‚ β”‚            β”‚ β”‚              β”‚
   β”‚ PM Β· Eng   β”‚ β”‚ PM Β· Eng   β”‚ β”‚ PM Β· Eng   β”‚ β”‚ PM Β· Eng     β”‚
   β”‚ Β· QA       β”‚ β”‚ Β· QA       β”‚ β”‚ Β· QA       β”‚ β”‚ Β· QA         β”‚
   β”‚            β”‚ β”‚            β”‚ β”‚            β”‚ β”‚              β”‚
   β”‚ booking-   β”‚ β”‚ workspace  β”‚ β”‚ driver app β”‚ β”‚ Real-time    β”‚
   β”‚ widget,    β”‚ β”‚ app,       β”‚ β”‚ packages/  β”‚ β”‚ Inbox,       β”‚
   β”‚ passenger  β”‚ β”‚ packages/  β”‚ β”‚ operations β”‚ β”‚ packages/    β”‚
   β”‚ app,       β”‚ β”‚ backoffice β”‚ β”‚            β”‚ β”‚ comms        β”‚
   β”‚ packages/  β”‚ β”‚            β”‚ β”‚            β”‚ β”‚              β”‚
   β”‚ commerce   β”‚ β”‚            β”‚ β”‚            β”‚ β”‚              β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

   β”Œβ”€β”€ Cross-Cutting Roles ──────────────────────────────────┐
   β”‚ Product Manager Β· Domain Expert Β· Knowledge Synthesis   β”‚
   β”‚ Co-Founder (Strategy)                                   β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

NOTE

Top-Down Context Inheritance. Paperclip natively enforces hierarchical context: Company Mission β†’ Project Goal β†’ Agent Task. When a Commerce Engineer picks up a ticket, it automatically receives this ancestry β€” it sees not just the code task but that it belongs to the "Platform Payments" Project Goal aligned with the "Monetize the MVP" Company Mission. This reduces prompt engineering overhead and ensures strategic alignment without manual context injection.

3.2 Bounded Contexts β€” Access Control Matrix ​

Each agent domain has a strict read scope. Anything outside its scope is invisible. Domain teams map to BusFlow's four DDD bounded contexts; cross-cutting roles span all domains but receive only summaries, not raw access.

Domain Teams (scoped to monorepo directories) ​

Domain TeamMonorepo ScopeCan ReadCannot Read
Commerceapps/booking-widget, apps/passenger, packages/commerce/*Domain source code, schema-commerce.md, domain tests, CI/CD logsOther domains' code, marketing campaigns, customer PII, financials
Backofficeapps/workspace, packages/backoffice/*Domain source code, schema-backoffice.md, domain tests, CI/CD logsOther domains' code, marketing campaigns, customer PII, financials
Operationsapps/driver, packages/operations/*Domain source code, schema-operations.md, domain tests, CI/CD logsOther domains' code, marketing campaigns, customer PII, financials
Communicationspackages/comms/*, Real-time Inbox modulesDomain source code, schema-communications.md, messaging schemas, domain testsOther domains' code, marketing campaigns, financials

Cross-Cutting Roles ​

RoleCan ReadCannot Read
Product ManagerUsage analytics, feature flags, churn metrics, all domain summaries (not raw data)Source code details, marketing campaign internals
Domain ExpertRegulations, industry publications, domain knowledge baseSource code, marketing, customer PII
Knowledge SynthesisAll public sources, industry data, competitor intelInternal code, customer data, financials
Co-Founder (Strategy)Domain summaries from all agents, decision log, KPIsRaw code, raw customer data β€” only aggregated views

IMPORTANT

Access control serves dual purposes: (1) security/privacy and (2) token economy β€” agents with smaller context windows are cheaper and less prone to hallucination.

3.2a Domain Team Composition ​

Each of the four domain teams follows a standardized three-role structure:

RoleResponsibilitiesScoping
Domain PM/LeadBreaks epics from the CEO agent into domain-scoped tickets. Prioritizes backlog. Ensures alignment with project goals.Reads domain summaries + cross-domain event contracts. No code access.
Domain EngineerCoding agent (e.g., Claude, Codex). Implements features, fixes bugs, writes tests.Strictly restricted to changes within its domain's directories. Cannot modify files outside its bounded context.
Domain QA/ReviewerRuns domain-specific Vitest/Playwright suites. Verifies adherence to domain schema (docs/architecture/schema-<pillar>.md). Enforces architectural constraints.Reads domain code + test results. Writes learnings to .knowledge.md (see Β§3.4).

Why three roles instead of one Engineer?

  • Blast radius containment. A Commerce Engineer cannot accidentally break Operations code.
  • Distributed review. The three-stage review pipeline (Β§5) runs within each domain, not through a single global Supervisor bottleneck.
  • Deep context over broad context. Each Engineer loads only its domain's schemas, README, and .knowledge.md β€” smaller context windows, lower cost, fewer hallucinations (Principles P6, P7).

3.3 Event-Driven Communication ​

Agents communicate only through typed domain events on the Event Bus. Examples:

EventProducerConsumers
feature.shippedCommerce / Backoffice / Operations / Comms teamPM, Co-Founder
churn.risk.detectedPMCommerce team, Co-Founder
competitor.change.detectedKnowledge SynthesisPM, Co-Founder
compliance.rule.changedDomain ExpertRelevant domain team(s), Co-Founder
booking.schema.changedCommerce teamOperations team (soft FK), Comms team
task.proposedAny agentOrchestrator β†’ Founder Task Board

Rules:

  • Events are the only cross-domain data exchange mechanism
  • Events carry minimal payload β€” consumers request details through the Orchestrator if needed
  • All events log immutably for audit

NOTE

Implementation note (Β§10): Paperclip uses a ticket-based model instead of a typed event bus. Agents communicate through Paperclip's ticket system β€” a different mechanism achieving the same goal of structured, auditable cross-domain communication. The event bus model described here serves as the target architecture should the system evolve to Strategy D or C.

3.4 Cross-Session Domain Knowledge ​

AI models are stateless between fresh sessions. To simulate continuous domain mastery, each domain team maintains a persistent knowledge file:

File: packages/<domain>/.knowledge.md

Write discipline (QA agent):

  • When the Domain QA agent finds a bug, architectural violation, or non-obvious pattern, it instructs the Domain Engineer to fix the issue and appends the learning to .knowledge.md.
  • Entries follow a structured format: date, ticket reference, what went wrong, what the fix was, and the extracted rule.

Read discipline (Engineer agent):

  • The Engineer's SKILLS.md instructs it to always read .knowledge.md first upon waking for a new ticket.
  • This creates a locally-scoped, domain-specific knowledge base that grows as the team "works."

Precedent: This pattern parallels the existing .agents/skills/frontend/SKILL.md learnings section, which accumulates design system and accessibility learnings across sessions.

Session continuity:

  • Within a ticket: Paperclip maintains session state for ongoing tasks. An agent working on a multi-day ticket retains full context of previous tool calls and discussions for that specific ticket.
  • Across tickets: .knowledge.md carries forward accumulated domain wisdom. This is the only cross-session persistence mechanism.

IMPORTANT

.knowledge.md files are domain-scoped, not global. The Commerce team's knowledge base contains Commerce-specific learnings only. This preserves the bounded context boundary and keeps context windows small (Principle P6).

3.5 Orchestrator vs. Supervisor β€” Merged or Separate? ​

AspectMerged (recommended for start)Separate
Token costLower β€” one passHigher β€” two passes
RiskOrchestrator can rubber-stamp its own routingBetter separation of concerns
ComplexitySimpler to implementMore robust at scale
Recommendationβœ… Start hereEvolve to this when agent count > 5

Start merged: The Orchestrator routes events AND reviews outputs. Split into two roles when the system grows complex enough that review quality degrades.


4. Proactive Task Creation ​

Agents don't just respond to requests β€” they propose work by emitting task.proposed events.

4.1 Task Lifecycle ​

Agent proposes β†’ Orchestrator validates β†’ Founder Task Board β†’ Founder decides
     β”‚                    β”‚                       β”‚
  task.proposed    Enriches with context     Approve / Reject /
                   Deduplicates              Defer / Delegate
                   Assigns priority

4.2 Task Schema ​

Every proposed task follows this structure:

yaml
id: auto-generated
source_agent: marketing
type: opportunity | risk | maintenance | improvement
priority_suggestion: low | medium | high | critical
title: "Create comparison page: Busflow vs. Busvermietung24"
rationale: "Competitor launched new pricing page. SEO opportunity."
effort_estimate: small | medium | large
blocked_by: []  # dependencies on other tasks
decision_needed: true | false
expires: 2026-05-01  # optional, for time-sensitive tasks

4.3 Founder Task Board Requirements ​

  • Single view of all proposed tasks across all agent domains
  • Filterable by: source agent, type, priority, decision needed
  • Sortable by: priority, date proposed, effort
  • Batch actions: approve/reject multiple tasks at once
  • Snooze: defer a task to a specific date
  • Link to context: every task links to the report/event that spawned it

5. Review Layers β€” Critical Self-Analysis ​

5.1 Three-Stage Review Pipeline ​

StageWhoPurposeToken Cost
Self-ReviewWork AgentCritical self-analysis: "What could be wrong? What did I assume?"Included in generation
Supervisor ReviewOrchestrator/SupervisorCross-check against knowledge base, flag hallucinations, verify scope~20-30% of generation cost
Founder ReviewYouFinal authority, strategic judgment, overrideYour time

TIP

Token economy: The self-review is free (chain-of-thought). The Supervisor review should use a checklist approach (cheaper) instead of full re-generation. Only escalate to deep analysis when the checklist flags issues.

5.2 Work Agent Self-Review Requirements ​

Each agent output must include a Critical Self-Analysis section:

## Self-Analysis
- Confidence: medium
- Key assumptions: [list]
- What could be wrong: [list]
- Sources used: [list] / "no source β€” inference"
- Scope compliance: βœ… stayed within bounded context

5.3 Supervisor Review Checklist ​

Lightweight pass (not a full re-analysis):

  • [ ] Claims traceable to sources?
  • [ ] Agent stayed within its bounded context?
  • [ ] Output consistent with existing knowledge base?
  • [ ] No obvious hallucinations or fabricated data?
  • [ ] Self-analysis seems honest (not rubber-stamped)?

Correction log format:

πŸ”§ CORRECTED: [original β†’ fixed] β€” reason
⚠ UNVERIFIED: [claim] β€” no source found, kept with flag
βœ… VALIDATED: [n] items passed checklist

6. Output Format β€” Management Report Standard ​

Every report surfaced to the founder:

β”Œβ”€ MANAGEMENT SUMMARY ──────────────────────────┐
β”‚  1-3 sentences. Traffic light: 🟒 🟑 πŸ”΄       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β–Ό
β”Œβ”€ DECISION POINTS ─────────────────────────────┐
β”‚  β€’ Context Β· Options Β· AI recommendation      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β–Ό
β”Œβ”€ SUPERVISOR FINDINGS ─────────────────────────┐
β”‚  πŸ”§ Corrections Β· ⚠ Unverified Β· βœ… Validated β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β–Ό
β”Œβ”€ PROPOSED TASKS ──────────────────────────────┐
β”‚  New tasks this report generated (if any)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β–Ό
β”Œβ”€ DETAILED OUTPUT (drill-down) ────────────────┐
β”‚  Full work product Β· Agent self-analysis       β”‚
β”‚  Source citations Β· Collapsible sections       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Scannability rules:

  • 10-second rule: Status clear within 10 seconds
  • Traffic lights: 🟒 FYI only Β· 🟑 decisions needed Β· πŸ”΄ blocker
  • Progressive disclosure: Each layer is optional to read

7. Blast Radius Classification ​

NOTE

Replaces the binary "irreversible" language. With git, code changes are always technically reversible β€” but consequences may not be.

ClassExamplesRequired Approval
SandboxDraft content, analysis, code in branchAgent + Supervisor
Soft-reversibleMerge to main, publish blog draft, update docsFounder approval
Hard-reversibleDeploy to production, change pricing pageFounder approval + cooldown (1h)
Irreversible consequencesSend customer email, financial transaction, legal filingFounder approval + explicit confirmation

8. Escalation Tiers ​

TierTriggerHandlerFounder sees
T0Agent self-correctsWork AgentLogged in self-analysis
T1Supervisor catches errorSupervisorIn Supervisor Findings
T2Ambiguous / high-riskFounderAs Decision Point
T3Founder disagreesFounderOverride in decision log

9. Acceptance Criteria ​

  • [ ] All agents operate within defined bounded contexts (Β§3.2)
  • [ ] Cross-domain communication happens only through typed events (Β§3.3)
  • [ ] Every agent output includes Critical Self-Analysis (Β§5.2)
  • [ ] Supervisor review follows the checklist approach (Β§5.3)
  • [ ] Supervisor corrections are always visible, never silently merged
  • [ ] Reports follow the Management Report Standard (Β§6)
  • [ ] Founder can drill from any summary to full detail
  • [ ] Proactive tasks appear on a single, filterable Task Board (Β§4.3)
  • [ ] Blast Radius classification governs approval requirements (Β§7)
  • [ ] The system logs all events and decisions in an immutable audit trail

10. Implementation Decision ​

Decision: Strategy A β€” Pure Paperclip + MCP bridge ADR: ADR-020Date: 2026-04-17

Chosen Platform ​

Paperclip β€” an open-source, MIT-licensed, TypeScript-based agent orchestration platform. Deployed as a standalone service (own Postgres, own Docker container) communicating with Busflow via MCP (Model Context Protocol).

Spec-to-Implementation Mapping ​

Spec SectionImplementation
Β§2 Core PrinciplesEmbedded in all agent system prompts
Β§3.2 Bounded ContextsMCP tool assignment per agent (see MCP Agent Bridge Protocol)
Β§3.3 Event-Driven CommsPaperclip ticket system (different model, same intent)
Β§3.4 Orchestrator/SupervisorPaperclip merged model (heartbeats + approval gates)
Β§4 Proactive TasksAgents propose via Paperclip's ticket system
Β§5.2 Self-ReviewEnforced via system prompts (confidence, assumptions, sources)
Β§5.3 Supervisor ReviewPaperclip approval gates; evolve to LLM checklist if needed
Β§6 Management ReportsAgent output format enforced via prompts; evolve based on real usage
Β§7 Blast RadiusBinary approval gates (sufficient for solo founder); add tiers if needed
Β§8 EscalationPaperclip's audit trail + approval workflow
Β§9 Acceptance CriteriaQuality bar for evaluating when Strategy A is "enough" vs. when to evolve

Progressive Evolution Path ​

Strategy A (now)  ──►  Strategy D (if needed)  ──►  Strategy C (if needed)
Pure Paperclip         Fork + select additions       Full custom build
+ MCP bridge           (supervisor, reports,          (native Hasura/NestJS
                        blast radius plugins)          integration)

NOTE

The spec remains the canonical governance reference regardless of implementation strategy. Paperclip is the runtime; this document defines the principles.

Internal documentation β€” Busflow