Troubleshooting
Quick diagnostic: When something breaks, start here:
openspawn validate # catches most config issueslsof -i :3456 # check for port conflicts (MCP server)lsof -i :3333 # check for port conflicts (sandbox/dashboard)cat .openspawn/tasks.json # inspect task store stateJump to a section:
- Installation
- ORG.md parsing
- CLI problems
- MCP connection
- API auth errors
- Dashboard issues
- Task state machine
- Budget and credits
- Docker
1. Installation Issues
Section titled “1. Installation Issues”Symptom: npx openspawn fails with engine error or Unsupported engine
Section titled “Symptom: npx openspawn fails with engine error or Unsupported engine”Cause: OpenSpawn requires Node.js 18 or later. Older Node versions are not supported.
Fix:
node --version # must be v18.0.0 or higher# If outdated, upgrade via nvm:nvm install 20nvm use 20npx openspawn@latest init my-orgSymptom: EACCES: permission denied during install
Section titled “Symptom: EACCES: permission denied during install”Cause: Global npm install attempted without sufficient permissions, or the npm cache directory has wrong ownership.
Fix — use npx instead of global install:
# Don't do this:npm install -g openspawn
# Do this instead — npx handles the download each time:npx openspawn init my-orgIf you must install globally, fix npm’s prefix instead of using sudo:
mkdir -p ~/.npm-globalnpm config set prefix '~/.npm-global'export PATH=~/.npm-global/bin:$PATH # add to ~/.bashrcnpm install -g openspawnSymptom: Cannot find module or ERR_MODULE_NOT_FOUND at runtime
Section titled “Symptom: Cannot find module or ERR_MODULE_NOT_FOUND at runtime”Cause: The package ships as ESM ("type": "module"). This requires Node 18+ and cannot be require()-d from CommonJS code directly.
Fix for CLI use: Always invoke via npx openspawn or the openspawn binary — never require('openspawn') in CommonJS.
Fix for programmatic use in a CJS project:
// Use dynamic import instead:const { parseOrgMdContent } = await import("openspawn");Symptom: pnpm install fails in monorepo with peer dependency errors
Section titled “Symptom: pnpm install fails in monorepo with peer dependency errors”Cause: The workspace requires specific peer versions. Running plain npm install inside a sub-package instead of from the root will miss workspace links.
Fix:
# Always install from the repo root:cd /path/to/openspawnpnpm install# Then build:pnpm nx run-many -t build2. ORG.md Parsing Errors
Section titled “2. ORG.md Parsing Errors”The parser (org-parser.ts) uses remark to walk the markdown AST. It is forgiving — it won’t throw on most malformed input — but silently produces wrong results if the structure is off. Always run openspawn validate after editing.
Symptom: Org name shows as "Unnamed Org" in the dashboard
Section titled “Symptom: Org name shows as "Unnamed Org" in the dashboard”Cause: The parser reads the org name from the first H1 heading (# My Org). If there is no H1, or the file starts with H2, the name defaults to "Unnamed Org".
Fix: Make sure the very first line of your ORG.md is an H1:
# My SaaS Org ← must be the first heading
## Structure
...Symptom: Agents not showing up after editing ORG.md
Section titled “Symptom: Agents not showing up after editing ORG.md”Cause A: Agents must be defined under ## Structure, inside ### DeptName (H3) or #### RoleName (H4) sections. If you use the wrong heading depth, they are ignored.
Cause B: The bold-key metadata format is strict. - **Key:** Value must use exactly two asterisks and a colon with a space.
# Correct ✅
### Engineering
#### Alice — Senior Dev
- **Level:** 6- **Domain:** Engineering- **Reports to:** Bob
# Wrong ❌ — uses plain list, not bold-key
### Engineering
#### Alice — Senior Dev
- Level: 6 ← not bold, ignored- Domain: EngineeringSymptom: Agent level parsed as wrong value (e.g., falls back to level 4)
Section titled “Symptom: Agent level parsed as wrong value (e.g., falls back to level 4)”Cause: The Level key in the metadata list is missing or malformed. The parser falls back to role inference from the heading text (ceo/coo/cto → L10, lead/manager → L7, senior/principal → L6, junior/intern → L1, everything else → L4).
Fix: Always specify level explicitly:
#### Jordan — Backend Developer
- **Level:** 5 ← explicit; do not omit- **Domain:** EngineeringSymptom: Culture preset not applied — org uses default communication settings
Section titled “Symptom: Culture preset not applied — org uses default communication settings”Cause: The preset must be set either as a bare line preset: startup or as a bold-key item - **Preset:** startup inside the ## Culture section. Both formats work but they must be inside the H2 Culture section.
## Culture
preset: agency ← works
# OR:
## Culture
- **Preset:** agency ← also works- **Escalation:** immediateSymptom: Budget / per-agent limit not parsed from Policies section
Section titled “Symptom: Budget / per-agent limit not parsed from Policies section”Cause: The per-agent limit must use the exact key Per-agent limit (case-insensitive) as a bold-key item. The value can include units like 500 credits/period — the parser strips non-numeric characters.
## Policies
### Budget
- **Per-agent limit:** 500 credits/period ← correct- **Alert threshold:** 75% ← correct
# Wrong:
- Budget limit: 500 ← not bold, not a bold-key itemSymptom: Department caps not parsed
Section titled “Symptom: Department caps not parsed”Cause: Department caps must be in an H3 subsection whose heading contains the word “cap” (e.g., ### Department Caps), with list items in the format Department: max N agents.
### Department Caps
- Engineering: max 8 agents- Operations: max 3 agentsSymptom: org_read MCP tool returns { "error": "ORG.md not found at /path/ORG.md" }
Section titled “Symptom: org_read MCP tool returns { "error": "ORG.md not found at /path/ORG.md" }”Cause: The MCP server is looking for ORG.md relative to the --dir option (defaults to the current working directory). The file must exist at that path.
Fix:
# Make sure you're in the right directory:ls ORG.md # should existopenspawn start # starts with cwd as dir
# Or specify the dir explicitly:openspawn start --dir /path/to/my-org3. CLI Problems
Section titled “3. CLI Problems”Symptom: openspawn init says "ORG.md already exists, skipping."
Section titled “Symptom: openspawn init says "ORG.md already exists, skipping."”Cause: init is non-destructive. If ORG.md already exists in the target directory, it logs the message and continues (it still creates .openspawn/tasks.json if missing).
Fix: This is expected behavior. If you want to reset:
rm ORG.md # remove existingopenspawn init # scaffold fresh from default template# or from a template:openspawn init --template=incident-responseSymptom: openspawn start — port already in use
Section titled “Symptom: openspawn start — port already in use”Cause: The MCP server defaults to port 3456. Something else is listening on that port.
Symptom in logs: Error: listen EADDRINUSE: address already in use :::3456
Fix:
# Find what's using the port:lsof -i :3456
# Kill it:kill -9 <PID>
# Or start on a different port:openspawn start --port 3457
# Then update your MCP client config to point to the new port.Symptom: openspawn start — sandbox dashboard port conflict
Section titled “Symptom: openspawn start — sandbox dashboard port conflict”Cause: The sandbox/dashboard server defaults to port 3333 (SANDBOX_PORT env var).
Fix:
lsof -i :3333kill -9 <PID># or:SANDBOX_PORT=3334 openspawn startSymptom: openspawn commands fail with "Cannot find .openspawn directory"
Section titled “Symptom: openspawn commands fail with "Cannot find .openspawn directory"”Cause: The .openspawn/tasks.json store is missing. This happens if you cloned/copied the ORG.md without running openspawn init, or if .openspawn/ was deleted.
Fix:
# Re-initialize the state directory (non-destructive — skips if ORG.md exists):openspawn init
# Or create it manually:mkdir -p .openspawnecho '{"version":1,"tasks":[],"budgets":{}}' > .openspawn/tasks.jsonSymptom: openspawn validate reports "Agent reports to unknown agent"
Section titled “Symptom: openspawn validate reports "Agent reports to unknown agent"”Cause: The Reports to value must exactly match the name portion of another agent’s heading (the part before —). Typos, different casing, or a mismatch with an agent defined in a different section will cause this.
Fix:
### Alice — CEO
- **Reports to:** Human Principal ← top of hierarchy uses this literal string
#### Bob — Engineer
- **Reports to:** Alice ← must match exactly: "Alice" (not "alice" or "Alice — CEO")Symptom: openspawn validate reports "No top-level agent"
Section titled “Symptom: openspawn validate reports "No top-level agent"”Cause: Every org needs exactly one agent whose reporting chain terminates at Human Principal. Without this, the hierarchy has no root.
Fix:
### CEO — Chief Executive
- **Level:** 10- **Reports to:** Human Principal ← exactly this string4. MCP Connection Issues
Section titled “4. MCP Connection Issues”stdio vs HTTP — which transport to use?
Section titled “stdio vs HTTP — which transport to use?”OpenSpawn’s MCP server supports two transports:
| Transport | Flag | Best for |
|---|---|---|
| HTTP (Streamable) | (default) | Claude.ai projects, Cursor, direct API calls |
| stdio | --stdio | Claude Code CLI, embedded tool use |
Symptom: Claude Code can’t find the OpenSpawn tools
Section titled “Symptom: Claude Code can’t find the OpenSpawn tools”Fix — add to Claude Code’s MCP config (~/.claude/mcp_config.json):
{ "mcpServers": { "openspawn": { "command": "npx", "args": ["openspawn", "start", "--stdio", "--dir", "/path/to/my-org"], "env": {} } }}Verify the server starts:
npx openspawn start --stdio --dir /path/to/my-org# Should silently wait for stdin (no output in stdio mode is correct)Symptom: Cursor can’t connect to OpenSpawn MCP
Section titled “Symptom: Cursor can’t connect to OpenSpawn MCP”Fix — Cursor uses HTTP transport. Start the server first, then configure Cursor:
-
Start the server:
3456/mcp openspawn start --port 3456 -
In Cursor Settings → MCP → Add Server:
{"name": "openspawn","transport": "http","url": "http://localhost:3456/mcp"} -
Verify the server is reachable:
Terminal window curl http://localhost:3456/health# Expected: {"status":"ok","name":"openspawn"}
Symptom: MCP tools return { "error": "ORG.md not found at ..." }
Section titled “Symptom: MCP tools return { "error": "ORG.md not found at ..." }”Cause: The --dir argument to openspawn start doesn’t point to a directory containing ORG.md.
Fix:
# Confirm ORG.md exists in the target directory:ls /path/to/my-org/ORG.md
# Start with explicit dir:openspawn start --dir /path/to/my-org --stdioSymptom: task_claim returns { "error": "No open task found" }
Section titled “Symptom: task_claim returns { "error": "No open task found" }”Cause: Either (a) there are no tasks with status open, or (b) the specific taskId you requested doesn’t exist or is already claimed.
Fix:
# Check the task store directly:cat .openspawn/tasks.json | jq '.tasks[] | {id, status, assignee}'
# Create an open task first:# (via MCP tool call)# tool: task_create { "description": "Your task", "assignee": null }Symptom: delegate returns { "error": "Cannot delegate to agent of equal or higher level" }
Section titled “Symptom: delegate returns { "error": "Cannot delegate to agent of equal or higher level" }”Cause: Delegation only flows downward. You cannot delegate to an agent with level >= yourLevel.
Fix: Check the org hierarchy and ensure you’re delegating to a lower-level agent. Use org_read to see agent levels:
tool: org_read# Look at agents[].level in the response5. API Auth Errors
Section titled “5. API Auth Errors”The REST API (api.openspawn.ai or your self-hosted instance) supports three authentication methods. Each has distinct error messages.
HMAC Signature Authentication (agent-to-API)
Section titled “HMAC Signature Authentication (agent-to-API)”Agents authenticate using four request headers:
x-agent-id— the agent’s IDx-timestamp— Unix timestamp in secondsx-nonce— random unique string per requestx-signature— HMAC-SHA256 ofMETHOD + PATH + TIMESTAMP + NONCE + BODY
Signature message format (exact):
{METHOD}{PATH}{TIMESTAMP}{NONCE}{BODY}# Example:POSTlists/tasks17096835001a2b3c{"title":"test"}| Error message | Cause | Fix |
|---|---|---|
"Missing authentication headers: x-agent-id, x-timestamp, x-nonce, and x-signature are all required" | One or more HMAC headers are missing | Include all four headers in every request |
"Request timestamp outside valid window" | Clock skew > ±5 minutes | Sync your system clock: ntpdate -u pool.ntp.org |
"Invalid credentials" | Agent ID not found, or signature mismatch | Verify AGENT_ID matches the registered agent; recompute signature |
"Agent is not active" | The agent’s status is not ACTIVE in the database | Check agent status via dashboard or API |
"Nonce already used" | The nonce was replayed within a 10-minute window | Generate a fresh random nonce for every request |
Example: correct HMAC headers (Node.js)
import crypto from 'crypto';
const agentId = process.env.AGENT_ID;const secret = process.env.AGENT_SECRET;const method = 'POST';const path = '/tasks';const timestamp = String(Math.floor(Date.now() / 1000));const nonce = crypto.randomUUID();const body = JSON.stringify({ title: 'My task' });
const message = `${method}${path}${timestamp}${nonce}${body}`;const signature = crypto .createHmac('sha256', secret) .update(message) .digest('hex');
// Include in request:headers: { 'x-agent-id': agentId, 'x-timestamp': timestamp, 'x-nonce': nonce, 'x-signature': signature, 'content-type': 'application/json',}API Key Authentication (Bearer osp_...)
Section titled “API Key Authentication (Bearer osp_...)”| Error message | Cause | Fix |
|---|---|---|
"Missing authorization header" | No Authorization header sent | Add Authorization: Bearer osp_your_key |
"Invalid authorization format" | Header present but malformed (wrong scheme/no space) | Use exactly Bearer osp_... or ApiKey osp_... |
"Invalid API key format" | Token doesn’t start with osp_ | Use a key generated via the dashboard API Keys page |
"Invalid or expired API key" | Key revoked, expired, or wrong environment | Generate a new key in the dashboard |
"API key missing required scope: {scope}" | Key exists but lacks the permission for this endpoint | Regenerate key with the required scope checked |
JWT Authentication (dashboard users)
Section titled “JWT Authentication (dashboard users)”| Error message | Cause | Fix |
|---|---|---|
"Invalid or expired JWT token" | Access token has expired (default 15-minute lifetime) | Re-authenticate; the client should auto-refresh using the refresh token |
"User not found" | Account was deleted after the token was issued | Log out and log back in |
"Authentication required. Provide either a valid JWT Bearer token, API key (Bearer osp_...), or agent HMAC signature headers." | No auth header at all | Add the appropriate Authorization header |
HTTP Status Quick Reference
Section titled “HTTP Status Quick Reference”| Status | Typical meaning in OpenSpawn |
|---|---|
400 | Malformed request body (Zod validation failed) — check required fields |
401 | Auth failure — see error message for specific reason |
403 | Authenticated but not authorized — wrong org, wrong role, or pre-hook block |
404 | Resource not found — wrong ID, or resource belongs to a different org |
409 | Conflict — e.g., task already claimed by another agent |
422 | Invalid state transition (see Task State Machine) |
6. Dashboard Issues
Section titled “6. Dashboard Issues”Symptom: Dashboard shows a blank page / white screen
Section titled “Symptom: Dashboard shows a blank page / white screen”Cause A: The sandbox server isn’t running or the dashboard can’t reach it.
Fix:
# Check if the sandbox server is up:curl http://localhost:3333/health# Expected: {"status":"ok",...}
# If not running, start it:openspawn startCause B: JavaScript error on load. Open the browser console (F12 → Console) to see the actual error.
Symptom: SSE connection drops repeatedly / dashboard events stop updating
Section titled “Symptom: SSE connection drops repeatedly / dashboard events stop updating”Cause: SSE connections time out after periods of inactivity. Some reverse proxies (nginx, Caddy) have a default timeout that kills idle SSE streams.
Fix for nginx:
location /events { proxy_pass http://localhost:3333; proxy_http_version 1.1; proxy_set_header Connection ''; proxy_buffering off; proxy_cache off; proxy_read_timeout 86400s; # 24 hours chunked_transfer_encoding on;}Fix for Caddy (in your Caddyfile):
reverse_proxy /events localhost:3333 { flush_interval -1}The client-side dashboard reconnects automatically — this is cosmetic in most cases, but if events are being lost, check proxy timeout settings.
Symptom: Dashboard auth token expiry — gets logged out every 15 minutes
Section titled “Symptom: Dashboard auth token expiry — gets logged out every 15 minutes”Cause: The JWT access token lifetime is 15 minutes by default. The dashboard should automatically use the refresh token to obtain a new access token, but if the refresh fails, you’ll be logged out.
Fix:
- Check if
REFRESH_TOKEN_TTLis configured in your API environment (default: 7 days) - Ensure cookies are not being blocked — the refresh token is stored as an
HttpOnlycookie - If self-hosting, make sure the API and dashboard share the same domain so the cookie is sent with requests
Symptom: Agent cards show stale data / task counts don’t update
Section titled “Symptom: Agent cards show stale data / task counts don’t update”Cause: The dashboard uses SSE for live updates. If the SSE connection is broken, data won’t refresh until reconnect.
Fix: Hard-refresh the page (Ctrl+Shift+R / Cmd+Shift+R). If this is recurrent, check the SSE proxy config above.
7. Task State Machine
Section titled “7. Task State Machine”OpenSpawn’s tasks follow a strict state machine. Attempting an invalid transition returns HTTP 422.
Valid transitions
Section titled “Valid transitions”BACKLOG ──→ TODO ──→ IN_PROGRESS ──→ REVIEW ──→ DONE (terminal) │ │ │ │ └──→ CANCELLED ←────────┴─────────────┘ ↑ BLOCKED ──→ TODO └──→ IN_PROGRESSFull transition table:
| From | Allowed to values |
|---|---|
BACKLOG | TODO, CANCELLED |
TODO | IN_PROGRESS, BLOCKED, CANCELLED |
IN_PROGRESS | REVIEW, BLOCKED, CANCELLED |
REVIEW | DONE, IN_PROGRESS, CANCELLED |
BLOCKED | TODO, IN_PROGRESS, CANCELLED |
DONE | (none — terminal) |
CANCELLED | (none — terminal) |
Symptom: 422 Unprocessable Entity with "Invalid transition: {from} → {to}"
Section titled “Symptom: 422 Unprocessable Entity with "Invalid transition: {from} → {to}"”Cause: The requested status change is not in the table above.
Common invalid transitions and fixes:
| You tried | Error | Fix |
|---|---|---|
BACKLOG → IN_PROGRESS | Invalid | Move to TODO first, then IN_PROGRESS |
DONE → IN_PROGRESS | Invalid — DONE is terminal | Create a new task instead |
CANCELLED → TODO | Invalid — CANCELLED is terminal | Create a new task instead |
IN_PROGRESS → DONE | Invalid | Move to REVIEW first (or BLOCKED) |
TODO → DONE | Invalid | Must pass through IN_PROGRESS and REVIEW |
Symptom: Task is stuck in BLOCKED — nothing moves it forward
Section titled “Symptom: Task is stuck in BLOCKED — nothing moves it forward”Cause: A blocked task requires an agent to explicitly transition it back to TODO or IN_PROGRESS after resolving the blocker.
Fix:
- Check the task’s escalation — the escalating agent’s manager should have received it
- Resolve the blocker:
tool: escalation_resolve { escalationId: "esc-123", resolution: "Unblocked — DB creds provided" }
- Transition the task back:
tool: task_update { taskId: "task-007", status: "in-progress" }
Symptom: Task stuck in REVIEW — nobody is reviewing it
Section titled “Symptom: Task stuck in REVIEW — nobody is reviewing it”Cause: No agent with level >= 6 has claimed the review, or the review assignee is busy/inactive.
Fix: Explicitly assign and escalate:
tool: task_update { taskId: "task-007", assignee: "senior-reviewer-id" }# If urgent, escalate to the lead:tool: escalate { taskId: "task-007", reason: "Review overdue", agentId: "my-agent-id" }8. Budget / Credits
Section titled “8. Budget / Credits”Symptom: budget_check returns { "error": "No budget set", "agentId": "..." }
Section titled “Symptom: budget_check returns { "error": "No budget set", "agentId": "..." }”Cause: No budget limit has been configured for this agent. The budget system is opt-in — agents without a configured budget can spend freely.
Fix — set a budget limit via MCP tool:
tool: budget_spend is blocked until a limit is set via setBudgetLimit# Use the API or dashboard to configure a per-agent limit,# or add it to ORG.md:## Policies
### Budget
- **Per-agent limit:** 50Symptom: budget_spend returns { "ok": false, "remaining": 2.50, "entry": {...} }
Section titled “Symptom: budget_spend returns { "ok": false, "remaining": 2.50, "entry": {...} }”Cause: The requested spend exceeds the agent’s remaining budget. Spending is blocked when amount > (limit - spent) and the limit is greater than 0.
Fix:
- Check remaining balance:
tool: budget_check { "agentId": "my-agent" } - If you need more budget, update the limit via the dashboard or API
- In ORG.md, increase
Per-agent limit:- **Per-agent limit:** 200 - If an agent legitimately needs more than its limit, it should
escalatewith reason"budget exceeded"rather than silently stopping
Symptom: Department budget cap exceeded — agents can’t be hired
Section titled “Symptom: Department budget cap exceeded — agents can’t be hired”Cause: The departmentCaps policy in ORG.md limits how many agents can exist in a department.
Fix:
## Policies
### Department Caps
- Engineering: max 12 agents ← increase this number- Operations: max 5 agentsOr fire an inactive agent first:
tool: fire { "name": "Inactive Agent Name" }9. Docker
Section titled “9. Docker”Symptom: Container exits immediately — docker compose up shows exit code 1
Section titled “Symptom: Container exits immediately — docker compose up shows exit code 1”Diagnosis:
docker compose logs sandboxCommon causes:
| Log message | Fix |
|---|---|
Cannot find module ... | Image build is stale — rebuild: docker compose build --no-cache |
ENCRYPTION_KEY not configured | Add ENCRYPTION_KEY to your .env file |
Error: listen EADDRINUSE :::3333 | Host port 3333 is already in use — change the port mapping in docker-compose.yml |
Symptom: API container can’t connect to the database
Section titled “Symptom: API container can’t connect to the database”Cause: The API is starting before the database is ready, or the DATABASE_URL is wrong.
Symptom in logs: Connection refused or ECONNREFUSED to the database port.
Fix — check your docker-compose.yml has a proper depends_on with healthcheck:
services: api: depends_on: db: condition: service_healthy # wait for DB health check, not just start
db: image: postgres:16-alpine healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 5s retries: 5Check DATABASE_URL format:
# Correct for Docker Compose (use service name as host):DATABASE_URL=postgresql://postgres:password@db:5432/openspawn
# Wrong (localhost won't resolve to the DB container):DATABASE_URL=postgresql://postgres:password@localhost:5432/openspawnSymptom: Database migrations fail on first start
Section titled “Symptom: Database migrations fail on first start”Fix:
# Run migrations manually:docker compose exec api pnpm nx run api:migrate
# Or reset the database and re-migrate:docker compose down -v # removes volumes — DATA LOSSdocker compose up -dSymptom: Volume permission errors — EACCES writing to mounted directory
Section titled “Symptom: Volume permission errors — EACCES writing to mounted directory”Cause: The container runs as a non-root user but the host-mounted volume is owned by root.
Fix:
# Option 1: Fix ownership on the host:sudo chown -R 1000:1000 ./data
# Option 2: Set the user in docker-compose.yml:services: sandbox: user: "1000:1000"Symptom: docker compose up works but changes to ORG.md don’t take effect
Section titled “Symptom: docker compose up works but changes to ORG.md don’t take effect”Cause: When using SANDBOX_READONLY=1, the server serves a fixed snapshot and ignores file changes. Without READONLY, the file is read on each request but only if it’s mounted correctly.
Fix — verify the volume mount:
services: sandbox: volumes: - ./my-org:/app/org:ro # mounts your local org/ directory environment: - SANDBOX_READONLY=0 # allow live reloadingStill stuck?
Section titled “Still stuck?”- Run
openspawn validate— catches hierarchy and format errors - Check the FAQ for common Q&A
- Search GitHub Issues
- Open a new issue with:
- Your Node.js version (
node --version) - The exact error message
- Relevant section of your
ORG.md(redact sensitive info) - Output of
openspawn validate
- Your Node.js version (