Skip to content

ORG.md Specification

Define your agent organization in a single markdown file. Deploy it. Watch it work. Tune it over time.

The agent ecosystem already speaks markdown. CLAUDE.md defines one agent’s behavior. AGENTS.md defines workspace rules. ORG.md defines an entire organization.

Markdown has a unique advantage over YAML/JSON for this: you can mix intent with structure. An org definition isn’t just data — it’s philosophy. Why is the team structured this way? What communication norms matter? That context is critical when humans review changes, when agents onboard, and when the system proposes optimizations.

The markdown IS the documentation. No separate wiki explaining what the config means.

CLAUDE.md → defines one agent's behavior
AGENTS.md → defines workspace rules
ORG.md → defines an entire organization

An ORG.md file has five sections, each defined by a top-level heading. All sections are optional — the system uses sensible defaults for anything omitted.

# Organization Name
## Identity
## Culture
## SDLC
## Structure
## Policies
## Playbooks

Who is this organization? The name, mission, and context that every agent in the org inherits.

# Acme Engineering
## Identity
We build developer tools that make infrastructure invisible.
Every agent in this org serves that mission.
- **Industry:** Developer tools / SaaS
- **Stage:** Series A, 18 months old
- **Values:** Ship fast, measure everything, customers first

Why this matters: Agents use Identity as ambient context. When a marketing agent writes copy, it knows the company builds dev tools. When an engineering agent prioritizes work, “customers first” influences the decision. Identity is the system prompt for the entire org.


How the organization communicates and operates. Maps directly to ACP tunable parameters.

## Culture
We're a startup. Move fast, communicate openly, escalate immediately.
Nobody should be blocked for more than one cycle.
- **Communication:** async-first
- **Escalation:** immediate — we're too small to batch problems
- **Progress updates:** on phase change — not every tick, but don't go silent
- **Ack required:** yes — if you get a task, confirm it
- **Hierarchy depth:** shallow (3 levels max)

Preset cultures — shorthand for common patterns:

## Culture
preset: startup

Available presets:

PresetEscalationProgressHierarchyVibe
startupImmediateFrequent2-3 levelsFast, scrappy, everyone does everything
enterpriseBatched (hourly)On phase change5-8 levelsProcess-driven, governance, separation of concerns
agencyImmediateEvery tick3-4 levelsClient-facing, deadline-driven, high visibility
researchDelayedOn request2-3 levelsExploratory, high autonomy, long-running tasks
militaryImmediateEvery tickStrict chainZero ambiguity, mandatory acks, full situational awareness
remote-asyncDelayedOn requestFlatHigh trust, timezone-distributed, async-first

Presets are starting points. Override any parameter inline:

## Culture
preset: startup
- **Escalation:** delayed — we trust our leads to figure it out

How the organization develops, ships, and maintains software. Like Culture, this supports presets for common patterns — so every org inherits sane defaults without writing rules from scratch.

## SDLC
preset: standard

Available presets:

PresetBranch strategyPRs requiredDeploy verificationOrphan branches
standardTrunk-based, branch off mainYes, targeting mainSmoke test requiredForbidden
strictSame as standard + mandatory review, max 500 LOC per PRYes, with approvalFull E2E suiteForbidden
soloTrunk-based, direct push allowedOptionalManual spot-checkForbidden
researchFeature branches, long-lived OKYesOptionalForbidden

All presets share one invariant: orphan branches are always forbidden. Agents will take the path of least resistance — git init “works” locally but creates parallel histories that can’t merge. The spec prevents this by default.

Full example with overrides:

## SDLC
preset: standard
### Source Control
- **Branch strategy:** trunk-based — always branch off `main`
- **Branch naming:** `<role>/<feature>` (e.g., `web-eng/auth-flow`)
- **Orphan branches:** forbidden — never `git init`, never create disconnected history
- **Direct push to main:** never
- **Pre-work ritual:** `git fetch origin && git checkout -b <branch> origin/main`
### Pull Requests
- **Required:** yes — every change, even typo fixes
- **Target:** `main`
- **Max size:** 500 lines (soft), 1000 lines (hard)
- **Naming:** conventional commits (`feat:`, `fix:`, `docs:`, `chore:`)
- **Scope:** one feature per PR — don't accumulate large batches
### Quality Gates
- **Pre-merge:** typecheck passes (`tsc --noEmit`), lint clean, tests pass
- **Post-deploy:** smoke test required (Playwright or equivalent)
- **Dependency additions:** must be checked for framework compatibility
### Deploy
- **Pipeline:** PR merge → build → deploy → verify → announce
- **Verification:** HTTP 200 on primary routes + no client-side JS errors
- **Communication:** post to #alerts (or equivalent channel) on every deploy
### Incident Response
- **Flag immediately** in the team channel — don't wait
- **Document:** what happened, root cause, what changed
- **Post-mortem:** update SDLC rules if a process gap caused the incident

Why this matters: Without explicit SDLC rules, agents default to whatever works locally. A sub-agent that doesn’t know the branching strategy will git init a fresh repo, accumulate 66 commits on an orphan branch, and create a production incident when someone tries to merge it. SDLC defaults make the wrong thing hard and the right thing obvious.

Relationship to CONTRIBUTING.md: If the repo has a CONTRIBUTING.md, agents should read it. The SDLC section in ORG.md is the organizational policy; CONTRIBUTING.md is the repo-level implementation. They should align — if they conflict, CONTRIBUTING.md wins for that repo.

Agents that touch code need isolated working directories. The spec supports a ### Repository subsection that defines how agents share a codebase:

### Repository
clone: /opt/org/openspawn
strategy: worktree-per-agent
#### Worktrees
- designer → /opt/org/openspawn-designer
- web-eng → /opt/org/openspawn-web-eng
- docs-writer → /opt/org/openspawn-docs

worktree-per-agent (recommended default):

  • One shared .git directory — single object store, single fetch updates all refs
  • Each agent gets a dedicated worktree on a unique branch
  • Orphan histories are structurally impossiblegit worktree add always branches from the real tree
  • The org boot sequence creates worktrees; agents never run git init or git clone

Rules enforced by the strategy:

  • One branch per worktree. Two worktrees cannot check out the same branch.
  • Serialize fetches. One fetch before spawning a batch of agents — not per-agent. Prevents .git/index.lock contention.
  • No git stash. Stash is shared across worktrees. Agents commit or discard instead.
  • Worktree naming: <repo>-<agent-role> (e.g., openspawn-designer)
  • Cleanup on deactivation. When an agent is removed or a sub-agent finishes, git worktree prune reclaims its directory.

Alternative strategies:

  • clone-per-agent — full clone per agent. Higher disk usage but zero contention. Use for large teams or repos with submodules.
  • shared — all agents use the same working directory. Only viable for single-writer orgs (one agent writes, others read).

Why this is in the spec: The Feb 27 incident happened because a sub-agent was given a repo path and told to “work on the codebase.” It did what made sense locally: git init. By making workspace setup an org-level concern — defined in ORG.md, executed by the boot sequence — agents never have to figure out repository access on their own.


The org chart. Departments, roles, and hierarchy — defined as nested markdown.

## Structure
### COO
The operational backbone. Receives orders from the human principal,
breaks them into departmental work, ensures nothing falls through cracks.
- **Model:** claude-sonnet
- **Domain:** operations
- **Reports to:** Human Principal
### Engineering
Our largest team. Owns code, infrastructure, testing, and deployment.
#### Engineering Lead
Triages technical work. Delegates to specialists. Reviews output.
- **Model:** claude-sonnet
- **Domain:** engineering
#### Backend Senior
Owns API, database, and server infrastructure.
- **Model:** claude-haiku
- **Domain:** backend
- **Count:** 2
#### Frontend Workers
Build and maintain the dashboard and marketing site.
- **Model:** claude-haiku
- **Domain:** frontend
- **Count:** 3
#### QA Worker
Writes and runs tests. Reviews PRs for quality.
- **Model:** claude-haiku
- **Domain:** testing
### Security
Small but critical. Every deploy needs their sign-off.
#### Security Lead
- **Model:** claude-sonnet
- **Domain:** appsec
#### Security Worker
- **Model:** claude-haiku
- **Domain:** infrastructure-security
### Marketing
Owns content, campaigns, and public presence.
#### Marketing Lead
- **Model:** claude-sonnet
- **Domain:** content
#### Content Workers
- **Model:** claude-haiku
- **Domain:** copywriting
- **Count:** 2

How hierarchy is inferred:

  • H2 (##) = top-level section (Structure itself)
  • H3 (###) = department or C-level role (L9-10)
  • H4 (####) = department member roles
  • Nesting under a department heading = reports to that department’s lead
  • The first role under a department heading with no explicit Reports to = the department lead

Role keywords and levels:

Keyword in role nameInferred levelCan delegate?Can spawn?
COO, CTO, CEOL10
VP, Director, TalentL9
Lead, ManagerL7
Senior, PrincipalL6
Worker, Engineer, AgentL4
Junior, Intern, AssistantL1-2

The Count field: Creates multiple agents with the same role. They get auto-numbered names: “Frontend Worker 1”, “Frontend Worker 2”, etc. Each is an independent agent with its own task queue and trust score.

Prose matters: The text description above each role becomes part of that agent’s system prompt context. “Triages technical work. Delegates to specialists.” tells the LLM how to behave. Write the description like you’re explaining the role to a new hire.


Rules that govern how the organization operates. Budget, routing, permissions, and constraints.

## Policies
### Budget
- **Per-agent limit:** 1000 credits/period
- **Alert threshold:** 80%
- **Overage behavior:** pause and escalate — don't hard-stop
- **Period:** weekly
### Task Routing
Tasks are auto-routed to the right department by matching:
1. Domain keywords in the task title/description
2. Agent domain expertise
3. Current workload (prefer idle agents)
4. Trust score (higher trust gets harder tasks)
If no match is found, task goes to the COO for manual delegation.
### Permissions
- **L7+ can create tasks** — leads and above can break work into subtasks
- **L7+ can spawn agents** — leads can grow their team (up to department cap)
- **L6+ can review** — seniors and above can approve/reject work
- **All agents can escalate** — nobody should be silently stuck
### Department Caps
- Engineering: max 10 agents
- Security: max 4 agents
- Marketing: max 6 agents
- No department can exceed 15 agents without human approval
### Working Hours
- **Active hours:** 08:00-22:00 (org timezone)
- **Off-hours behavior:** queue tasks, don't process
- **Exceptions:** critical priority tasks process 24/7

Policies as guardrails: These aren’t suggestions — the system enforces them. An agent that tries to spawn when at department cap gets denied. An agent that exceeds budget gets paused. This is how you maintain control over autonomous agents.


Reusable procedures for common situations. Like runbooks in ops, but for your agent org.

## Playbooks
### New Task Arrives
1. COO receives task from Human Principal
2. COO categorizes by domain and priority
3. COO delegates to appropriate department lead
4. Lead acks (auto) and breaks into subtasks if needed
5. Lead assigns to available workers by trust score
6. Workers ack and begin — progress logged to task activity
### Escalation: BLOCKED
1. Agent creates escalation message with blocker details
2. Escalation goes to direct manager (never skip levels)
3. Manager has 2 cycles to respond:
- Provide the missing resource/context
- Reassign to a different agent
- Escalate further up
4. If unresolved after 2 levels, alert Human Principal
### Escalation: OUT_OF_DOMAIN
1. Agent flags task as wrong domain
2. Manager receives escalation
3. Manager re-delegates to correct department lead
4. Original agent is freed for other work
5. No penalty to original agent's trust score
### New Agent Onboarding
1. New agent spawned by a lead
2. First 3 tasks are LOW priority (warm-up period)
3. Trust score starts at 30 (PROBATION)
4. Mentor assigned: closest senior in same domain
5. After 5 successful tasks, promoted to TRUSTED
6. After 20 successful tasks, eligible for VETERAN
### Weekly Review (automated)
1. System compiles: tasks completed, escalation rate, budget burn
2. Generates org health score
3. Flags anomalies: sudden escalation spikes, idle agents, budget overruns
4. Sends digest to Human Principal
5. Proposes optimizations: "Engineering is bottlenecked, consider +1 senior"

Why playbooks in the org file: They’re not just documentation — they’re instructions. When an agent encounters “BLOCKED”, it can look up the playbook and follow the procedure. When the system onboards a new agent, it follows the onboarding playbook. The org file is simultaneously human documentation and machine instructions.


ORG.md is designed to be readable by humans and parseable by machines. The parsing rules are intentionally lenient:

Structured data is extracted from markdown bullet lists:

- **Key:** Value

The pattern - **Key:** Value extracts { key: "value" }. Keys are case-insensitive and normalized (spaces → underscores).

Any text that isn’t structured metadata becomes context:

  • Department descriptions → department-level system prompt context
  • Role descriptions → agent-level system prompt context
  • Policy explanations → system enforcement rules
  • Playbook steps → procedural instructions
  • **Count:** 3 → spawn 3 agents with this role
  • **Per-agent limit:** 1000 → numeric extraction
  • **Max depth:** 3 → numeric extraction

Models can be specified as:

  • Full provider/model: anthropic/claude-sonnet-4-5
  • Alias: claude-sonnet, claude-haiku, gpt-4o
  • Relative: same-as-lead, fastest, cheapest
  • Omitted: defaults to org-level default or system default
## Structure → section marker
### Department Name → L9-10 department / C-level
#### Role Name → L4-7 team member (inherits department)
##### Sub-role → L1-3 junior / intern

Terminal window
# Deploy an org from a file
bikinibottom deploy ORG.md
# Deploy with a specific culture override
bikinibottom deploy ORG.md --culture=enterprise
# Dry run — show what would be created
bikinibottom deploy ORG.md --dry-run

On deploy:

  1. Parse ORG.md
  2. Create agents according to Structure
  3. Apply Culture parameters to ACP config
  4. Enforce Policies as system constraints
  5. Load Playbooks as procedural knowledge
  6. Start the simulation / connect to live system

ORG.md can be modified while the org is running:

Terminal window
# Apply changes from updated file
bikinibottom apply ORG.md

The system diffs the current state against the new file:

  • New roles → spawn agents
  • Removed roles → gracefully wind down (finish current tasks, then deactivate)
  • Changed policies → apply immediately
  • Changed culture → update ACP parameters live
  • Changed descriptions → update system prompts on next tick

ORG.md lives in git. Standard version control applies:

Terminal window
git diff ORG.md # See what changed in the org
git log ORG.md # History of org changes
git blame ORG.md # Who changed the escalation policy?

PR reviews for org changes:

PR #42: Add data team (2 agents)
+ ### Data & Analytics
+ Owns data pipelines, reporting, and business intelligence.
+
+ #### Data Lead
+ - **Model:** claude-sonnet
+ - **Domain:** data-engineering
+
+ #### Data Worker
+ - **Model:** claude-haiku
+ - **Domain:** analytics

Reviewers can discuss: “Do we need a full team or just one analyst?” — the same way you’d review infrastructure-as-code changes.

Running orgs can export their current state back to ORG.md:

Terminal window
# Export current org state (including dynamically spawned agents)
bikinibottom export > ORG.md

This captures the actual org — including agents that were spawned dynamically by leads. The exported file becomes the new source of truth.


A single composite score (0-100) computed from ACP metrics:

ComponentWeightHealthyUnhealthy
Ack latency15%< 1 cycle> 3 cycles
Escalation rate20%< 10% of tasks> 30% of tasks
Completion rate25%> 90%< 70%
Budget utilization15%40-80%< 20% or > 95%
Agent idle rate10%< 30%> 60%
Time-to-completion15%Trending downTrending up

Score interpretation:

  • 90-100: Elite org — highly efficient, minimal waste
  • 70-89: Healthy — normal operations, minor inefficiencies
  • 50-69: Needs attention — bottlenecks or misrouting
  • < 50: Restructure recommended — systemic issues

The system observes patterns and proposes changes:

## Recommendations (auto-generated)
### 🔴 Critical
- Engineering escalation rate is 35% (threshold: 10%)
→ Recommendation: Add 1 senior backend agent
→ Impact: Estimated 20% reduction in escalation rate
### 🟡 Warning
- Marketing has 2 idle agents while Security is overloaded
→ Recommendation: Cross-train 1 marketing worker for security tasks
→ Impact: Reduce security task queue by ~30%
### 🟢 Optimization
- Agent "Backend Senior 2" has 98% success rate over 50 tasks
→ Recommendation: Promote to Lead, create Backend sub-team
→ Impact: Free up Engineering Lead for higher-level planning

Recommendations are suggestions, not actions. A human reviews and approves via the dashboard or by modifying ORG.md.

Run two org structures simultaneously and compare:

Terminal window
bikinibottom ab-test ORG-v1.md ORG-v2.md --tasks=100

Both orgs process the same task set. The system reports:

  • Completion rate, time-to-completion, escalation rate, cost
  • Statistical significance of differences
  • Recommendation: which org structure performed better

This is how you data-drive organizational design.


# My Dev Team
## Culture
preset: startup
## Structure
### Me (Human Principal)
I make the decisions. Agents do the work.
### Code Agent
Writes code, runs tests, submits PRs.
- **Model:** claude-sonnet
- **Domain:** fullstack
### Review Agent
Reviews PRs, checks for bugs and style issues.
- **Model:** claude-haiku
- **Domain:** code-review
### Docs Agent
Keeps documentation in sync with code changes.
- **Model:** claude-haiku
- **Domain:** documentation
# Creative Agency
## Culture
preset: agency
- **Progress updates:** every tick — clients expect visibility
## Structure
### Account Director
Manages client relationships. Routes work to the right team.
- **Model:** claude-sonnet
- **Domain:** account-management
### Design Team
#### Design Lead
- **Model:** claude-sonnet
- **Domain:** visual-design
#### Designers
- **Model:** claude-haiku
- **Domain:** ui-ux
- **Count:** 3
### Content Team
#### Content Lead
- **Model:** claude-sonnet
- **Domain:** content-strategy
#### Writers
- **Model:** claude-haiku
- **Domain:** copywriting
- **Count:** 4
## Policies
### Client SLA
- Critical tasks: response within 1 cycle
- Normal tasks: completion within 10 cycles
- All tasks: progress update every 2 cycles
# AI Research Lab
## Culture
preset: research
- **Escalation:** delayed — let researchers explore before flagging blockers
## Structure
### Principal Investigator
Sets research direction. Reviews findings. Publishes papers.
- **Model:** claude-opus
- **Domain:** ml-research
### Senior Researchers
- **Model:** claude-sonnet
- **Domain:** experimentation
- **Count:** 2
### Research Assistants
Run experiments, collect data, write up results.
- **Model:** claude-haiku
- **Domain:** data-collection
- **Count:** 3
## Policies
### Exploration Budget
- **Per-agent limit:** 5000 credits/period — research needs room to explore
- **No hard stops** — flag at 90%, but don't interrupt an experiment

StandardScopeRelationship
CLAUDE.mdOne agent’s behaviorORG.md wraps multiple agents, each with their own implicit “CLAUDE.md” (their role description)
AGENTS.mdWorkspace rulesORG.md is the superset — workspace rules + org structure + policies
ACPCommunication protocolORG.md’s Culture section configures ACP parameters
A2AInter-org communicationORG.md defines one org; A2A connects multiple orgs
Terraform/PulumiInfrastructure as codeORG.md is the same pattern applied to agent organizations

  1. Readable first. If a human can’t understand the org from reading the file, the file has failed. Structure and intent should be obvious without documentation.

  2. Prose is configuration. Role descriptions aren’t comments — they become system prompt context. Write them like you’re onboarding a real employee.

  3. Defaults over verbosity. Omit what you don’t care about. The system picks sensible defaults. A 10-line ORG.md should produce a functional org.

  4. Git-native. ORG.md is a text file in version control. Diff, blame, review, rollback — all the tools you already have.

  5. Living document. The file evolves with the org. Dynamic changes (spawned agents, promotions) can be exported back. The file is always the source of truth.

  6. Human in the loop. The system recommends. Humans decide. ORG.md changes require a human commit (or explicit auto-approve for specific recommendations).


ORG.md turns organizational design from tribal knowledge into version-controlled, reviewable, deployable code. It’s the missing layer between “I have agents” and “I have an organization.”