Agentic-Led Company β Governance & Control Spec Sheet β
Working title: Agentic-Led Company Author: Julian BrΓΌning Β· Date: 2026-04-17 Status: ACCEPTED β Implementation via Paperclip + MCP (see Β§10)
1. Purpose β
Define the governance model for a solo-founder SaaS company (Busflow) where AI agents perform work across all departments. The model must guarantee:
- AI self-governance β agents catch their own mistakes before output reaches the founder
- Founder sovereignty β every decision and output is visible, scannable, and overridable
- Context efficiency β agents operate with minimal, scoped context to reduce cost and hallucination risk
- Proactive task surfacing β agents don't just respond; they identify and propose work
2. Core Principles β
| # | Principle | Description |
|---|---|---|
| P1 | No blind delegation | Every agent output must pass through at least one review layer before it becomes a decision |
| P2 | Scannable by default | All outputs follow a standardized hierarchy: Summary β Decisions β Details |
| P3 | Traceability | Every output traces back to the input that triggered it and the reasoning chain used |
| P4 | Hallucination visibility | Supervisor corrections are always surfaced, never silently merged |
| P5 | Founder is final authority | No high-impact action without explicit founder approval (see Β§7 Blast Radius) |
| P6 | Least-privilege context | Each agent receives only the context its bounded context permits |
| P7 | Token economy | Every review layer must justify its cost; prefer lightweight checks over full re-analysis |
3. Event-Driven Agent Architecture β
3.1 Design Overview β
The architecture organizes agents as bounded contexts (mirroring DDD) that communicate exclusively through a central Event Bus. No agent directly calls another agent. The Orchestrator routes events and the Supervisor reviews outputs β these may be the same or separate roles (see Β§3.4).
Agent teams map directly to BusFlow's four DDD bounded contexts (pillars), ensuring each team operates with deep, scoped knowledge of its domain rather than shallow, broad knowledge of the entire codebase.
ββββββββββββββββββββββββββ
β FOUNDER (You) β
β Approve Β· Override β
β Task Board (Β§4) β
ββββββββββββ¬ββββββββββββββ
β Management Reports
β Task proposals
ββββββββββββΌββββββββββββββ
β CEO / CTO AGENTS β
β Routes events β
β Reviews outputs β
β Enforces standards β
ββββββββββββ¬ββββββββββββββ
β Domain Events
βββββββββββββββ¬ββββββββ΄ββββββββ¬ββββββββββββββ
β β β β
βββββββΌβββββββ ββββββΌββββββββ βββββββΌβββββββ βββββΌβββββββββββ
β COMMERCE β β BACKOFFICE β β OPERATIONS β β COMMS β
β TEAM β β TEAM β β TEAM β β TEAM β
β β β β β β β β
β PM Β· Eng β β PM Β· Eng β β PM Β· Eng β β PM Β· Eng β
β Β· QA β β Β· QA β β Β· QA β β Β· QA β
β β β β β β β β
β booking- β β workspace β β driver app β β Real-time β
β widget, β β app, β β packages/ β β Inbox, β
β passenger β β packages/ β β operations β β packages/ β
β app, β β backoffice β β β β comms β
β packages/ β β β β β β β
β commerce β β β β β β β
ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ
βββ Cross-Cutting Roles βββββββββββββββββββββββββββββββββββ
β Product Manager Β· Domain Expert Β· Knowledge Synthesis β
β Co-Founder (Strategy) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββNOTE
Top-Down Context Inheritance. Paperclip natively enforces hierarchical context: Company Mission β Project Goal β Agent Task. When a Commerce Engineer picks up a ticket, it automatically receives this ancestry β it sees not just the code task but that it belongs to the "Platform Payments" Project Goal aligned with the "Monetize the MVP" Company Mission. This reduces prompt engineering overhead and ensures strategic alignment without manual context injection.
3.2 Bounded Contexts β Access Control Matrix β
Each agent domain has a strict read scope. Anything outside its scope is invisible. Domain teams map to BusFlow's four DDD bounded contexts; cross-cutting roles span all domains but receive only summaries, not raw access.
Domain Teams (scoped to monorepo directories) β
| Domain Team | Monorepo Scope | Can Read | Cannot Read |
|---|---|---|---|
| Commerce | apps/booking-widget, apps/passenger, packages/commerce/* | Domain source code, schema-commerce.md, domain tests, CI/CD logs | Other domains' code, marketing campaigns, customer PII, financials |
| Backoffice | apps/workspace, packages/backoffice/* | Domain source code, schema-backoffice.md, domain tests, CI/CD logs | Other domains' code, marketing campaigns, customer PII, financials |
| Operations | apps/driver, packages/operations/* | Domain source code, schema-operations.md, domain tests, CI/CD logs | Other domains' code, marketing campaigns, customer PII, financials |
| Communications | packages/comms/*, Real-time Inbox modules | Domain source code, schema-communications.md, messaging schemas, domain tests | Other domains' code, marketing campaigns, financials |
Cross-Cutting Roles β
| Role | Can Read | Cannot Read |
|---|---|---|
| Product Manager | Usage analytics, feature flags, churn metrics, all domain summaries (not raw data) | Source code details, marketing campaign internals |
| Domain Expert | Regulations, industry publications, domain knowledge base | Source code, marketing, customer PII |
| Knowledge Synthesis | All public sources, industry data, competitor intel | Internal code, customer data, financials |
| Co-Founder (Strategy) | Domain summaries from all agents, decision log, KPIs | Raw code, raw customer data β only aggregated views |
IMPORTANT
Access control serves dual purposes: (1) security/privacy and (2) token economy β agents with smaller context windows are cheaper and less prone to hallucination.
3.2a Domain Team Composition β
Each of the four domain teams follows a standardized three-role structure:
| Role | Responsibilities | Scoping |
|---|---|---|
| Domain PM/Lead | Breaks epics from the CEO agent into domain-scoped tickets. Prioritizes backlog. Ensures alignment with project goals. | Reads domain summaries + cross-domain event contracts. No code access. |
| Domain Engineer | Coding agent (e.g., Claude, Codex). Implements features, fixes bugs, writes tests. | Strictly restricted to changes within its domain's directories. Cannot modify files outside its bounded context. |
| Domain QA/Reviewer | Runs domain-specific Vitest/Playwright suites. Verifies adherence to domain schema (docs/architecture/schema-<pillar>.md). Enforces architectural constraints. | Reads domain code + test results. Writes learnings to .knowledge.md (see Β§3.4). |
Why three roles instead of one Engineer?
- Blast radius containment. A Commerce Engineer cannot accidentally break Operations code.
- Distributed review. The three-stage review pipeline (Β§5) runs within each domain, not through a single global Supervisor bottleneck.
- Deep context over broad context. Each Engineer loads only its domain's schemas, README, and
.knowledge.mdβ smaller context windows, lower cost, fewer hallucinations (Principles P6, P7).
3.3 Event-Driven Communication β
Agents communicate only through typed domain events on the Event Bus. Examples:
| Event | Producer | Consumers |
|---|---|---|
feature.shipped | Commerce / Backoffice / Operations / Comms team | PM, Co-Founder |
churn.risk.detected | PM | Commerce team, Co-Founder |
competitor.change.detected | Knowledge Synthesis | PM, Co-Founder |
compliance.rule.changed | Domain Expert | Relevant domain team(s), Co-Founder |
booking.schema.changed | Commerce team | Operations team (soft FK), Comms team |
task.proposed | Any agent | Orchestrator β Founder Task Board |
Rules:
- Events are the only cross-domain data exchange mechanism
- Events carry minimal payload β consumers request details through the Orchestrator if needed
- All events log immutably for audit
NOTE
Implementation note (Β§10): Paperclip uses a ticket-based model instead of a typed event bus. Agents communicate through Paperclip's ticket system β a different mechanism achieving the same goal of structured, auditable cross-domain communication. The event bus model described here serves as the target architecture should the system evolve to Strategy D or C.
3.4 Cross-Session Domain Knowledge β
AI models are stateless between fresh sessions. To simulate continuous domain mastery, each domain team maintains a persistent knowledge file:
File: packages/<domain>/.knowledge.md
Write discipline (QA agent):
- When the Domain QA agent finds a bug, architectural violation, or non-obvious pattern, it instructs the Domain Engineer to fix the issue and appends the learning to
.knowledge.md. - Entries follow a structured format: date, ticket reference, what went wrong, what the fix was, and the extracted rule.
Read discipline (Engineer agent):
- The Engineer's
SKILLS.mdinstructs it to always read.knowledge.mdfirst upon waking for a new ticket. - This creates a locally-scoped, domain-specific knowledge base that grows as the team "works."
Precedent: This pattern parallels the existing .agents/skills/frontend/SKILL.md learnings section, which accumulates design system and accessibility learnings across sessions.
Session continuity:
- Within a ticket: Paperclip maintains session state for ongoing tasks. An agent working on a multi-day ticket retains full context of previous tool calls and discussions for that specific ticket.
- Across tickets:
.knowledge.mdcarries forward accumulated domain wisdom. This is the only cross-session persistence mechanism.
IMPORTANT
.knowledge.md files are domain-scoped, not global. The Commerce team's knowledge base contains Commerce-specific learnings only. This preserves the bounded context boundary and keeps context windows small (Principle P6).
3.5 Orchestrator vs. Supervisor β Merged or Separate? β
| Aspect | Merged (recommended for start) | Separate |
|---|---|---|
| Token cost | Lower β one pass | Higher β two passes |
| Risk | Orchestrator can rubber-stamp its own routing | Better separation of concerns |
| Complexity | Simpler to implement | More robust at scale |
| Recommendation | β Start here | Evolve to this when agent count > 5 |
Start merged: The Orchestrator routes events AND reviews outputs. Split into two roles when the system grows complex enough that review quality degrades.
4. Proactive Task Creation β
Agents don't just respond to requests β they propose work by emitting task.proposed events.
4.1 Task Lifecycle β
Agent proposes β Orchestrator validates β Founder Task Board β Founder decides
β β β
task.proposed Enriches with context Approve / Reject /
Deduplicates Defer / Delegate
Assigns priority4.2 Task Schema β
Every proposed task follows this structure:
id: auto-generated
source_agent: marketing
type: opportunity | risk | maintenance | improvement
priority_suggestion: low | medium | high | critical
title: "Create comparison page: Busflow vs. Busvermietung24"
rationale: "Competitor launched new pricing page. SEO opportunity."
effort_estimate: small | medium | large
blocked_by: [] # dependencies on other tasks
decision_needed: true | false
expires: 2026-05-01 # optional, for time-sensitive tasks4.3 Founder Task Board Requirements β
- Single view of all proposed tasks across all agent domains
- Filterable by: source agent, type, priority, decision needed
- Sortable by: priority, date proposed, effort
- Batch actions: approve/reject multiple tasks at once
- Snooze: defer a task to a specific date
- Link to context: every task links to the report/event that spawned it
5. Review Layers β Critical Self-Analysis β
5.1 Three-Stage Review Pipeline β
| Stage | Who | Purpose | Token Cost |
|---|---|---|---|
| Self-Review | Work Agent | Critical self-analysis: "What could be wrong? What did I assume?" | Included in generation |
| Supervisor Review | Orchestrator/Supervisor | Cross-check against knowledge base, flag hallucinations, verify scope | ~20-30% of generation cost |
| Founder Review | You | Final authority, strategic judgment, override | Your time |
TIP
Token economy: The self-review is free (chain-of-thought). The Supervisor review should use a checklist approach (cheaper) instead of full re-generation. Only escalate to deep analysis when the checklist flags issues.
5.2 Work Agent Self-Review Requirements β
Each agent output must include a Critical Self-Analysis section:
## Self-Analysis
- Confidence: medium
- Key assumptions: [list]
- What could be wrong: [list]
- Sources used: [list] / "no source β inference"
- Scope compliance: β
stayed within bounded context5.3 Supervisor Review Checklist β
Lightweight pass (not a full re-analysis):
- [ ] Claims traceable to sources?
- [ ] Agent stayed within its bounded context?
- [ ] Output consistent with existing knowledge base?
- [ ] No obvious hallucinations or fabricated data?
- [ ] Self-analysis seems honest (not rubber-stamped)?
Correction log format:
π§ CORRECTED: [original β fixed] β reason
β UNVERIFIED: [claim] β no source found, kept with flag
β
VALIDATED: [n] items passed checklist6. Output Format β Management Report Standard β
Every report surfaced to the founder:
ββ MANAGEMENT SUMMARY βββββββββββββββββββββββββββ
β 1-3 sentences. Traffic light: π’ π‘ π΄ β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
βΌ
ββ DECISION POINTS ββββββββββββββββββββββββββββββ
β β’ Context Β· Options Β· AI recommendation β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
βΌ
ββ SUPERVISOR FINDINGS ββββββββββββββββββββββββββ
β π§ Corrections Β· β Unverified Β· β
Validated β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
βΌ
ββ PROPOSED TASKS βββββββββββββββββββββββββββββββ
β New tasks this report generated (if any) β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
βΌ
ββ DETAILED OUTPUT (drill-down) βββββββββββββββββ
β Full work product Β· Agent self-analysis β
β Source citations Β· Collapsible sections β
ββββββββββββββββββββββββββββββββββββββββββββββββββScannability rules:
- 10-second rule: Status clear within 10 seconds
- Traffic lights: π’ FYI only Β· π‘ decisions needed Β· π΄ blocker
- Progressive disclosure: Each layer is optional to read
7. Blast Radius Classification β
NOTE
Replaces the binary "irreversible" language. With git, code changes are always technically reversible β but consequences may not be.
| Class | Examples | Required Approval |
|---|---|---|
| Sandbox | Draft content, analysis, code in branch | Agent + Supervisor |
| Soft-reversible | Merge to main, publish blog draft, update docs | Founder approval |
| Hard-reversible | Deploy to production, change pricing page | Founder approval + cooldown (1h) |
| Irreversible consequences | Send customer email, financial transaction, legal filing | Founder approval + explicit confirmation |
8. Escalation Tiers β
| Tier | Trigger | Handler | Founder sees |
|---|---|---|---|
| T0 | Agent self-corrects | Work Agent | Logged in self-analysis |
| T1 | Supervisor catches error | Supervisor | In Supervisor Findings |
| T2 | Ambiguous / high-risk | Founder | As Decision Point |
| T3 | Founder disagrees | Founder | Override in decision log |
9. Acceptance Criteria β
- [ ] All agents operate within defined bounded contexts (Β§3.2)
- [ ] Cross-domain communication happens only through typed events (Β§3.3)
- [ ] Every agent output includes Critical Self-Analysis (Β§5.2)
- [ ] Supervisor review follows the checklist approach (Β§5.3)
- [ ] Supervisor corrections are always visible, never silently merged
- [ ] Reports follow the Management Report Standard (Β§6)
- [ ] Founder can drill from any summary to full detail
- [ ] Proactive tasks appear on a single, filterable Task Board (Β§4.3)
- [ ] Blast Radius classification governs approval requirements (Β§7)
- [ ] The system logs all events and decisions in an immutable audit trail
10. Implementation Decision β
Decision: Strategy A β Pure Paperclip + MCP bridge ADR: ADR-020Date: 2026-04-17
Chosen Platform β
Paperclip β an open-source, MIT-licensed, TypeScript-based agent orchestration platform. Deployed as a standalone service (own Postgres, own Docker container) communicating with Busflow via MCP (Model Context Protocol).
Spec-to-Implementation Mapping β
| Spec Section | Implementation |
|---|---|
| Β§2 Core Principles | Embedded in all agent system prompts |
| Β§3.2 Bounded Contexts | MCP tool assignment per agent (see MCP Agent Bridge Protocol) |
| Β§3.3 Event-Driven Comms | Paperclip ticket system (different model, same intent) |
| Β§3.4 Orchestrator/Supervisor | Paperclip merged model (heartbeats + approval gates) |
| Β§4 Proactive Tasks | Agents propose via Paperclip's ticket system |
| Β§5.2 Self-Review | Enforced via system prompts (confidence, assumptions, sources) |
| Β§5.3 Supervisor Review | Paperclip approval gates; evolve to LLM checklist if needed |
| Β§6 Management Reports | Agent output format enforced via prompts; evolve based on real usage |
| Β§7 Blast Radius | Binary approval gates (sufficient for solo founder); add tiers if needed |
| Β§8 Escalation | Paperclip's audit trail + approval workflow |
| Β§9 Acceptance Criteria | Quality bar for evaluating when Strategy A is "enough" vs. when to evolve |
Progressive Evolution Path β
Strategy A (now) βββΊ Strategy D (if needed) βββΊ Strategy C (if needed)
Pure Paperclip Fork + select additions Full custom build
+ MCP bridge (supervisor, reports, (native Hasura/NestJS
blast radius plugins) integration)NOTE
The spec remains the canonical governance reference regardless of implementation strategy. Paperclip is the runtime; this document defines the principles.