ORG.md Specification
ORG.md — Organization as Code
Section titled “ORG.md — Organization as Code”Define your agent organization in a single markdown file. Deploy it. Watch it work. Tune it over time.
Why Markdown?
Section titled “Why Markdown?”The agent ecosystem already speaks markdown. CLAUDE.md defines one agent’s behavior. AGENTS.md defines workspace rules. ORG.md defines an entire organization.
Markdown has a unique advantage over YAML/JSON for this: you can mix intent with structure. An org definition isn’t just data — it’s philosophy. Why is the team structured this way? What communication norms matter? That context is critical when humans review changes, when agents onboard, and when the system proposes optimizations.
The markdown IS the documentation. No separate wiki explaining what the config means.
CLAUDE.md → defines one agent's behaviorAGENTS.md → defines workspace rulesORG.md → defines an entire organization1. Anatomy of an ORG.md
Section titled “1. Anatomy of an ORG.md”An ORG.md file has five sections, each defined by a top-level heading. All sections are optional — the system uses sensible defaults for anything omitted.
# Organization Name
## Identity
## Culture
## SDLC
## Structure
## Policies
## Playbooks1.1 Identity
Section titled “1.1 Identity”Who is this organization? The name, mission, and context that every agent in the org inherits.
# Acme Engineering
## Identity
We build developer tools that make infrastructure invisible.Every agent in this org serves that mission.
- **Industry:** Developer tools / SaaS- **Stage:** Series A, 18 months old- **Values:** Ship fast, measure everything, customers firstWhy this matters: Agents use Identity as ambient context. When a marketing agent writes copy, it knows the company builds dev tools. When an engineering agent prioritizes work, “customers first” influences the decision. Identity is the system prompt for the entire org.
1.2 Culture
Section titled “1.2 Culture”How the organization communicates and operates. Maps directly to ACP tunable parameters.
## Culture
We're a startup. Move fast, communicate openly, escalate immediately.Nobody should be blocked for more than one cycle.
- **Communication:** async-first- **Escalation:** immediate — we're too small to batch problems- **Progress updates:** on phase change — not every tick, but don't go silent- **Ack required:** yes — if you get a task, confirm it- **Hierarchy depth:** shallow (3 levels max)Preset cultures — shorthand for common patterns:
## Culture
preset: startupAvailable presets:
| Preset | Escalation | Progress | Hierarchy | Vibe |
|---|---|---|---|---|
startup | Immediate | Frequent | 2-3 levels | Fast, scrappy, everyone does everything |
enterprise | Batched (hourly) | On phase change | 5-8 levels | Process-driven, governance, separation of concerns |
agency | Immediate | Every tick | 3-4 levels | Client-facing, deadline-driven, high visibility |
research | Delayed | On request | 2-3 levels | Exploratory, high autonomy, long-running tasks |
military | Immediate | Every tick | Strict chain | Zero ambiguity, mandatory acks, full situational awareness |
remote-async | Delayed | On request | Flat | High trust, timezone-distributed, async-first |
Presets are starting points. Override any parameter inline:
## Culture
preset: startup
- **Escalation:** delayed — we trust our leads to figure it out1.3 SDLC
Section titled “1.3 SDLC”How the organization develops, ships, and maintains software. Like Culture, this supports presets for common patterns — so every org inherits sane defaults without writing rules from scratch.
## SDLC
preset: standardAvailable presets:
| Preset | Branch strategy | PRs required | Deploy verification | Orphan branches |
|---|---|---|---|---|
standard | Trunk-based, branch off main | Yes, targeting main | Smoke test required | Forbidden |
strict | Same as standard + mandatory review, max 500 LOC per PR | Yes, with approval | Full E2E suite | Forbidden |
solo | Trunk-based, direct push allowed | Optional | Manual spot-check | Forbidden |
research | Feature branches, long-lived OK | Yes | Optional | Forbidden |
All presets share one invariant: orphan branches are always forbidden. Agents will take the path of least resistance — git init “works” locally but creates parallel histories that can’t merge. The spec prevents this by default.
Full example with overrides:
## SDLC
preset: standard
### Source Control
- **Branch strategy:** trunk-based — always branch off `main`- **Branch naming:** `<role>/<feature>` (e.g., `web-eng/auth-flow`)- **Orphan branches:** forbidden — never `git init`, never create disconnected history- **Direct push to main:** never- **Pre-work ritual:** `git fetch origin && git checkout -b <branch> origin/main`
### Pull Requests
- **Required:** yes — every change, even typo fixes- **Target:** `main`- **Max size:** 500 lines (soft), 1000 lines (hard)- **Naming:** conventional commits (`feat:`, `fix:`, `docs:`, `chore:`)- **Scope:** one feature per PR — don't accumulate large batches
### Quality Gates
- **Pre-merge:** typecheck passes (`tsc --noEmit`), lint clean, tests pass- **Post-deploy:** smoke test required (Playwright or equivalent)- **Dependency additions:** must be checked for framework compatibility
### Deploy
- **Pipeline:** PR merge → build → deploy → verify → announce- **Verification:** HTTP 200 on primary routes + no client-side JS errors- **Communication:** post to #alerts (or equivalent channel) on every deploy
### Incident Response
- **Flag immediately** in the team channel — don't wait- **Document:** what happened, root cause, what changed- **Post-mortem:** update SDLC rules if a process gap caused the incidentWhy this matters: Without explicit SDLC rules, agents default to whatever works locally. A sub-agent that doesn’t know the branching strategy will git init a fresh repo, accumulate 66 commits on an orphan branch, and create a production incident when someone tries to merge it. SDLC defaults make the wrong thing hard and the right thing obvious.
Relationship to CONTRIBUTING.md: If the repo has a CONTRIBUTING.md, agents should read it. The SDLC section in ORG.md is the organizational policy; CONTRIBUTING.md is the repo-level implementation. They should align — if they conflict, CONTRIBUTING.md wins for that repo.
Workspace Strategy
Section titled “Workspace Strategy”Agents that touch code need isolated working directories. The spec supports a ### Repository subsection that defines how agents share a codebase:
### Repository
clone: /opt/org/openspawnstrategy: worktree-per-agent
#### Worktrees
- designer → /opt/org/openspawn-designer- web-eng → /opt/org/openspawn-web-eng- docs-writer → /opt/org/openspawn-docsworktree-per-agent (recommended default):
- One shared
.gitdirectory — single object store, single fetch updates all refs - Each agent gets a dedicated worktree on a unique branch
- Orphan histories are structurally impossible —
git worktree addalways branches from the real tree - The org boot sequence creates worktrees; agents never run
git initorgit clone
Rules enforced by the strategy:
- One branch per worktree. Two worktrees cannot check out the same branch.
- Serialize fetches. One fetch before spawning a batch of agents — not per-agent. Prevents
.git/index.lockcontention. - No
git stash. Stash is shared across worktrees. Agents commit or discard instead. - Worktree naming:
<repo>-<agent-role>(e.g.,openspawn-designer) - Cleanup on deactivation. When an agent is removed or a sub-agent finishes,
git worktree prunereclaims its directory.
Alternative strategies:
clone-per-agent— full clone per agent. Higher disk usage but zero contention. Use for large teams or repos with submodules.shared— all agents use the same working directory. Only viable for single-writer orgs (one agent writes, others read).
Why this is in the spec: The Feb 27 incident happened because a sub-agent was given a repo path and told to “work on the codebase.” It did what made sense locally: git init. By making workspace setup an org-level concern — defined in ORG.md, executed by the boot sequence — agents never have to figure out repository access on their own.
1.4 Structure
Section titled “1.4 Structure”The org chart. Departments, roles, and hierarchy — defined as nested markdown.
## Structure
### COO
The operational backbone. Receives orders from the human principal,breaks them into departmental work, ensures nothing falls through cracks.
- **Model:** claude-sonnet- **Domain:** operations- **Reports to:** Human Principal
### Engineering
Our largest team. Owns code, infrastructure, testing, and deployment.
#### Engineering Lead
Triages technical work. Delegates to specialists. Reviews output.
- **Model:** claude-sonnet- **Domain:** engineering
#### Backend Senior
Owns API, database, and server infrastructure.
- **Model:** claude-haiku- **Domain:** backend- **Count:** 2
#### Frontend Workers
Build and maintain the dashboard and marketing site.
- **Model:** claude-haiku- **Domain:** frontend- **Count:** 3
#### QA Worker
Writes and runs tests. Reviews PRs for quality.
- **Model:** claude-haiku- **Domain:** testing
### Security
Small but critical. Every deploy needs their sign-off.
#### Security Lead
- **Model:** claude-sonnet- **Domain:** appsec
#### Security Worker
- **Model:** claude-haiku- **Domain:** infrastructure-security
### Marketing
Owns content, campaigns, and public presence.
#### Marketing Lead
- **Model:** claude-sonnet- **Domain:** content
#### Content Workers
- **Model:** claude-haiku- **Domain:** copywriting- **Count:** 2How hierarchy is inferred:
- H2 (
##) = top-level section (Structure itself) - H3 (
###) = department or C-level role (L9-10) - H4 (
####) = department member roles - Nesting under a department heading = reports to that department’s lead
- The first role under a department heading with no explicit
Reports to= the department lead
Role keywords and levels:
| Keyword in role name | Inferred level | Can delegate? | Can spawn? |
|---|---|---|---|
| COO, CTO, CEO | L10 | ✅ | ✅ |
| VP, Director, Talent | L9 | ✅ | ✅ |
| Lead, Manager | L7 | ✅ | ✅ |
| Senior, Principal | L6 | ✅ | ❌ |
| Worker, Engineer, Agent | L4 | ❌ | ❌ |
| Junior, Intern, Assistant | L1-2 | ❌ | ❌ |
The Count field: Creates multiple agents with the same role. They get auto-numbered names: “Frontend Worker 1”, “Frontend Worker 2”, etc. Each is an independent agent with its own task queue and trust score.
Prose matters: The text description above each role becomes part of that agent’s system prompt context. “Triages technical work. Delegates to specialists.” tells the LLM how to behave. Write the description like you’re explaining the role to a new hire.
1.5 Policies
Section titled “1.5 Policies”Rules that govern how the organization operates. Budget, routing, permissions, and constraints.
## Policies
### Budget
- **Per-agent limit:** 1000 credits/period- **Alert threshold:** 80%- **Overage behavior:** pause and escalate — don't hard-stop- **Period:** weekly
### Task Routing
Tasks are auto-routed to the right department by matching:
1. Domain keywords in the task title/description2. Agent domain expertise3. Current workload (prefer idle agents)4. Trust score (higher trust gets harder tasks)
If no match is found, task goes to the COO for manual delegation.
### Permissions
- **L7+ can create tasks** — leads and above can break work into subtasks- **L7+ can spawn agents** — leads can grow their team (up to department cap)- **L6+ can review** — seniors and above can approve/reject work- **All agents can escalate** — nobody should be silently stuck
### Department Caps
- Engineering: max 10 agents- Security: max 4 agents- Marketing: max 6 agents- No department can exceed 15 agents without human approval
### Working Hours
- **Active hours:** 08:00-22:00 (org timezone)- **Off-hours behavior:** queue tasks, don't process- **Exceptions:** critical priority tasks process 24/7Policies as guardrails: These aren’t suggestions — the system enforces them. An agent that tries to spawn when at department cap gets denied. An agent that exceeds budget gets paused. This is how you maintain control over autonomous agents.
1.6 Playbooks
Section titled “1.6 Playbooks”Reusable procedures for common situations. Like runbooks in ops, but for your agent org.
## Playbooks
### New Task Arrives
1. COO receives task from Human Principal2. COO categorizes by domain and priority3. COO delegates to appropriate department lead4. Lead acks (auto) and breaks into subtasks if needed5. Lead assigns to available workers by trust score6. Workers ack and begin — progress logged to task activity
### Escalation: BLOCKED
1. Agent creates escalation message with blocker details2. Escalation goes to direct manager (never skip levels)3. Manager has 2 cycles to respond: - Provide the missing resource/context - Reassign to a different agent - Escalate further up4. If unresolved after 2 levels, alert Human Principal
### Escalation: OUT_OF_DOMAIN
1. Agent flags task as wrong domain2. Manager receives escalation3. Manager re-delegates to correct department lead4. Original agent is freed for other work5. No penalty to original agent's trust score
### New Agent Onboarding
1. New agent spawned by a lead2. First 3 tasks are LOW priority (warm-up period)3. Trust score starts at 30 (PROBATION)4. Mentor assigned: closest senior in same domain5. After 5 successful tasks, promoted to TRUSTED6. After 20 successful tasks, eligible for VETERAN
### Weekly Review (automated)
1. System compiles: tasks completed, escalation rate, budget burn2. Generates org health score3. Flags anomalies: sudden escalation spikes, idle agents, budget overruns4. Sends digest to Human Principal5. Proposes optimizations: "Engineering is bottlenecked, consider +1 senior"Why playbooks in the org file: They’re not just documentation — they’re instructions. When an agent encounters “BLOCKED”, it can look up the playbook and follow the procedure. When the system onboards a new agent, it follows the onboarding playbook. The org file is simultaneously human documentation and machine instructions.
2. Parsing Rules
Section titled “2. Parsing Rules”ORG.md is designed to be readable by humans and parseable by machines. The parsing rules are intentionally lenient:
2.1 Metadata Extraction
Section titled “2.1 Metadata Extraction”Structured data is extracted from markdown bullet lists:
- **Key:** ValueThe pattern - **Key:** Value extracts { key: "value" }. Keys are case-insensitive and normalized (spaces → underscores).
2.2 Free Text = Context
Section titled “2.2 Free Text = Context”Any text that isn’t structured metadata becomes context:
- Department descriptions → department-level system prompt context
- Role descriptions → agent-level system prompt context
- Policy explanations → system enforcement rules
- Playbook steps → procedural instructions
2.3 Numbers and Counts
Section titled “2.3 Numbers and Counts”**Count:** 3→ spawn 3 agents with this role**Per-agent limit:** 1000→ numeric extraction**Max depth:** 3→ numeric extraction
2.4 Model References
Section titled “2.4 Model References”Models can be specified as:
- Full provider/model:
anthropic/claude-sonnet-4-5 - Alias:
claude-sonnet,claude-haiku,gpt-4o - Relative:
same-as-lead,fastest,cheapest - Omitted: defaults to org-level default or system default
2.5 Hierarchy from Headings
Section titled “2.5 Hierarchy from Headings”## Structure → section marker### Department Name → L9-10 department / C-level#### Role Name → L4-7 team member (inherits department)##### Sub-role → L1-3 junior / intern3. Lifecycle
Section titled “3. Lifecycle”3.1 Deployment
Section titled “3.1 Deployment”# Deploy an org from a filebikinibottom deploy ORG.md
# Deploy with a specific culture overridebikinibottom deploy ORG.md --culture=enterprise
# Dry run — show what would be createdbikinibottom deploy ORG.md --dry-runOn deploy:
- Parse ORG.md
- Create agents according to Structure
- Apply Culture parameters to ACP config
- Enforce Policies as system constraints
- Load Playbooks as procedural knowledge
- Start the simulation / connect to live system
3.2 Live Editing
Section titled “3.2 Live Editing”ORG.md can be modified while the org is running:
# Apply changes from updated filebikinibottom apply ORG.mdThe system diffs the current state against the new file:
- New roles → spawn agents
- Removed roles → gracefully wind down (finish current tasks, then deactivate)
- Changed policies → apply immediately
- Changed culture → update ACP parameters live
- Changed descriptions → update system prompts on next tick
3.3 Versioning
Section titled “3.3 Versioning”ORG.md lives in git. Standard version control applies:
git diff ORG.md # See what changed in the orggit log ORG.md # History of org changesgit blame ORG.md # Who changed the escalation policy?PR reviews for org changes:
PR #42: Add data team (2 agents)
+ ### Data & Analytics+ Owns data pipelines, reporting, and business intelligence.++ #### Data Lead+ - **Model:** claude-sonnet+ - **Domain:** data-engineering++ #### Data Worker+ - **Model:** claude-haiku+ - **Domain:** analyticsReviewers can discuss: “Do we need a full team or just one analyst?” — the same way you’d review infrastructure-as-code changes.
3.4 Export
Section titled “3.4 Export”Running orgs can export their current state back to ORG.md:
# Export current org state (including dynamically spawned agents)bikinibottom export > ORG.mdThis captures the actual org — including agents that were spawned dynamically by leads. The exported file becomes the new source of truth.
4. Org Health & Intelligence
Section titled “4. Org Health & Intelligence”4.1 Health Score
Section titled “4.1 Health Score”A single composite score (0-100) computed from ACP metrics:
| Component | Weight | Healthy | Unhealthy |
|---|---|---|---|
| Ack latency | 15% | < 1 cycle | > 3 cycles |
| Escalation rate | 20% | < 10% of tasks | > 30% of tasks |
| Completion rate | 25% | > 90% | < 70% |
| Budget utilization | 15% | 40-80% | < 20% or > 95% |
| Agent idle rate | 10% | < 30% | > 60% |
| Time-to-completion | 15% | Trending down | Trending up |
Score interpretation:
- 90-100: Elite org — highly efficient, minimal waste
- 70-89: Healthy — normal operations, minor inefficiencies
- 50-69: Needs attention — bottlenecks or misrouting
- < 50: Restructure recommended — systemic issues
4.2 Self-Healing Recommendations
Section titled “4.2 Self-Healing Recommendations”The system observes patterns and proposes changes:
## Recommendations (auto-generated)
### 🔴 Critical
- Engineering escalation rate is 35% (threshold: 10%) → Recommendation: Add 1 senior backend agent → Impact: Estimated 20% reduction in escalation rate
### 🟡 Warning
- Marketing has 2 idle agents while Security is overloaded → Recommendation: Cross-train 1 marketing worker for security tasks → Impact: Reduce security task queue by ~30%
### 🟢 Optimization
- Agent "Backend Senior 2" has 98% success rate over 50 tasks → Recommendation: Promote to Lead, create Backend sub-team → Impact: Free up Engineering Lead for higher-level planningRecommendations are suggestions, not actions. A human reviews and approves via the dashboard or by modifying ORG.md.
4.3 A/B Testing
Section titled “4.3 A/B Testing”Run two org structures simultaneously and compare:
bikinibottom ab-test ORG-v1.md ORG-v2.md --tasks=100Both orgs process the same task set. The system reports:
- Completion rate, time-to-completion, escalation rate, cost
- Statistical significance of differences
- Recommendation: which org structure performed better
This is how you data-drive organizational design.
5. Examples
Section titled “5. Examples”5.1 Solo Developer + Agents
Section titled “5.1 Solo Developer + Agents”# My Dev Team
## Culture
preset: startup
## Structure
### Me (Human Principal)
I make the decisions. Agents do the work.
### Code Agent
Writes code, runs tests, submits PRs.
- **Model:** claude-sonnet- **Domain:** fullstack
### Review Agent
Reviews PRs, checks for bugs and style issues.
- **Model:** claude-haiku- **Domain:** code-review
### Docs Agent
Keeps documentation in sync with code changes.
- **Model:** claude-haiku- **Domain:** documentation5.2 Agency with Client Teams
Section titled “5.2 Agency with Client Teams”# Creative Agency
## Culture
preset: agency
- **Progress updates:** every tick — clients expect visibility
## Structure
### Account Director
Manages client relationships. Routes work to the right team.
- **Model:** claude-sonnet- **Domain:** account-management
### Design Team
#### Design Lead
- **Model:** claude-sonnet- **Domain:** visual-design
#### Designers
- **Model:** claude-haiku- **Domain:** ui-ux- **Count:** 3
### Content Team
#### Content Lead
- **Model:** claude-sonnet- **Domain:** content-strategy
#### Writers
- **Model:** claude-haiku- **Domain:** copywriting- **Count:** 4
## Policies
### Client SLA
- Critical tasks: response within 1 cycle- Normal tasks: completion within 10 cycles- All tasks: progress update every 2 cycles5.3 Research Lab
Section titled “5.3 Research Lab”# AI Research Lab
## Culture
preset: research
- **Escalation:** delayed — let researchers explore before flagging blockers
## Structure
### Principal Investigator
Sets research direction. Reviews findings. Publishes papers.
- **Model:** claude-opus- **Domain:** ml-research
### Senior Researchers
- **Model:** claude-sonnet- **Domain:** experimentation- **Count:** 2
### Research Assistants
Run experiments, collect data, write up results.
- **Model:** claude-haiku- **Domain:** data-collection- **Count:** 3
## Policies
### Exploration Budget
- **Per-agent limit:** 5000 credits/period — research needs room to explore- **No hard stops** — flag at 90%, but don't interrupt an experiment6. Relationship to Existing Standards
Section titled “6. Relationship to Existing Standards”| Standard | Scope | Relationship |
|---|---|---|
CLAUDE.md | One agent’s behavior | ORG.md wraps multiple agents, each with their own implicit “CLAUDE.md” (their role description) |
AGENTS.md | Workspace rules | ORG.md is the superset — workspace rules + org structure + policies |
| ACP | Communication protocol | ORG.md’s Culture section configures ACP parameters |
| A2A | Inter-org communication | ORG.md defines one org; A2A connects multiple orgs |
| Terraform/Pulumi | Infrastructure as code | ORG.md is the same pattern applied to agent organizations |
7. Design Principles
Section titled “7. Design Principles”-
Readable first. If a human can’t understand the org from reading the file, the file has failed. Structure and intent should be obvious without documentation.
-
Prose is configuration. Role descriptions aren’t comments — they become system prompt context. Write them like you’re onboarding a real employee.
-
Defaults over verbosity. Omit what you don’t care about. The system picks sensible defaults. A 10-line ORG.md should produce a functional org.
-
Git-native. ORG.md is a text file in version control. Diff, blame, review, rollback — all the tools you already have.
-
Living document. The file evolves with the org. Dynamic changes (spawned agents, promotions) can be exported back. The file is always the source of truth.
-
Human in the loop. The system recommends. Humans decide. ORG.md changes require a human commit (or explicit auto-approve for specific recommendations).
ORG.md turns organizational design from tribal knowledge into version-controlled, reviewable, deployable code. It’s the missing layer between “I have agents” and “I have an organization.”