Skip to content

Troubleshooting

Quick diagnostic: When something breaks, start here:

Terminal window
openspawn validate # catches most config issues
lsof -i :3456 # check for port conflicts (MCP server)
lsof -i :3333 # check for port conflicts (sandbox/dashboard)
cat .openspawn/tasks.json # inspect task store state

Jump to a section:


Symptom: npx openspawn fails with engine error or Unsupported engine

Section titled “Symptom: npx openspawn fails with engine error or Unsupported engine”

Cause: OpenSpawn requires Node.js 18 or later. Older Node versions are not supported.

Fix:

Terminal window
node --version # must be v18.0.0 or higher
# If outdated, upgrade via nvm:
nvm install 20
nvm use 20
npx openspawn@latest init my-org

Symptom: EACCES: permission denied during install

Section titled “Symptom: EACCES: permission denied during install”

Cause: Global npm install attempted without sufficient permissions, or the npm cache directory has wrong ownership.

Fix — use npx instead of global install:

Terminal window
# Don't do this:
npm install -g openspawn
# Do this instead — npx handles the download each time:
npx openspawn init my-org

If you must install globally, fix npm’s prefix instead of using sudo:

Terminal window
mkdir -p ~/.npm-global
npm config set prefix '~/.npm-global'
export PATH=~/.npm-global/bin:$PATH # add to ~/.bashrc
npm install -g openspawn

Symptom: Cannot find module or ERR_MODULE_NOT_FOUND at runtime

Section titled “Symptom: Cannot find module or ERR_MODULE_NOT_FOUND at runtime”

Cause: The package ships as ESM ("type": "module"). This requires Node 18+ and cannot be require()-d from CommonJS code directly.

Fix for CLI use: Always invoke via npx openspawn or the openspawn binary — never require('openspawn') in CommonJS.

Fix for programmatic use in a CJS project:

// Use dynamic import instead:
const { parseOrgMdContent } = await import("openspawn");

Symptom: pnpm install fails in monorepo with peer dependency errors

Section titled “Symptom: pnpm install fails in monorepo with peer dependency errors”

Cause: The workspace requires specific peer versions. Running plain npm install inside a sub-package instead of from the root will miss workspace links.

Fix:

Terminal window
# Always install from the repo root:
cd /path/to/openspawn
pnpm install
# Then build:
pnpm nx run-many -t build

The parser (org-parser.ts) uses remark to walk the markdown AST. It is forgiving — it won’t throw on most malformed input — but silently produces wrong results if the structure is off. Always run openspawn validate after editing.

Symptom: Org name shows as "Unnamed Org" in the dashboard

Section titled “Symptom: Org name shows as "Unnamed Org" in the dashboard”

Cause: The parser reads the org name from the first H1 heading (# My Org). If there is no H1, or the file starts with H2, the name defaults to "Unnamed Org".

Fix: Make sure the very first line of your ORG.md is an H1:

# My SaaS Org ← must be the first heading
## Structure
...

Symptom: Agents not showing up after editing ORG.md

Section titled “Symptom: Agents not showing up after editing ORG.md”

Cause A: Agents must be defined under ## Structure, inside ### DeptName (H3) or #### RoleName (H4) sections. If you use the wrong heading depth, they are ignored.

Cause B: The bold-key metadata format is strict. - **Key:** Value must use exactly two asterisks and a colon with a space.

# Correct ✅
### Engineering
#### Alice — Senior Dev
- **Level:** 6
- **Domain:** Engineering
- **Reports to:** Bob
# Wrong ❌ — uses plain list, not bold-key
### Engineering
#### Alice — Senior Dev
- Level: 6 ← not bold, ignored
- Domain: Engineering

Symptom: Agent level parsed as wrong value (e.g., falls back to level 4)

Section titled “Symptom: Agent level parsed as wrong value (e.g., falls back to level 4)”

Cause: The Level key in the metadata list is missing or malformed. The parser falls back to role inference from the heading text (ceo/coo/cto → L10, lead/manager → L7, senior/principal → L6, junior/intern → L1, everything else → L4).

Fix: Always specify level explicitly:

#### Jordan — Backend Developer
- **Level:** 5 ← explicit; do not omit
- **Domain:** Engineering

Symptom: Culture preset not applied — org uses default communication settings

Section titled “Symptom: Culture preset not applied — org uses default communication settings”

Cause: The preset must be set either as a bare line preset: startup or as a bold-key item - **Preset:** startup inside the ## Culture section. Both formats work but they must be inside the H2 Culture section.

## Culture
preset: agency ← works
# OR:
## Culture
- **Preset:** agency ← also works
- **Escalation:** immediate

Symptom: Budget / per-agent limit not parsed from Policies section

Section titled “Symptom: Budget / per-agent limit not parsed from Policies section”

Cause: The per-agent limit must use the exact key Per-agent limit (case-insensitive) as a bold-key item. The value can include units like 500 credits/period — the parser strips non-numeric characters.

## Policies
### Budget
- **Per-agent limit:** 500 credits/period ← correct
- **Alert threshold:** 75% ← correct
# Wrong:
- Budget limit: 500 ← not bold, not a bold-key item

Cause: Department caps must be in an H3 subsection whose heading contains the word “cap” (e.g., ### Department Caps), with list items in the format Department: max N agents.

### Department Caps
- Engineering: max 8 agents
- Operations: max 3 agents

Symptom: org_read MCP tool returns { "error": "ORG.md not found at /path/ORG.md" }

Section titled “Symptom: org_read MCP tool returns { "error": "ORG.md not found at /path/ORG.md" }”

Cause: The MCP server is looking for ORG.md relative to the --dir option (defaults to the current working directory). The file must exist at that path.

Fix:

Terminal window
# Make sure you're in the right directory:
ls ORG.md # should exist
openspawn start # starts with cwd as dir
# Or specify the dir explicitly:
openspawn start --dir /path/to/my-org

Symptom: openspawn init says "ORG.md already exists, skipping."

Section titled “Symptom: openspawn init says "ORG.md already exists, skipping."”

Cause: init is non-destructive. If ORG.md already exists in the target directory, it logs the message and continues (it still creates .openspawn/tasks.json if missing).

Fix: This is expected behavior. If you want to reset:

Terminal window
rm ORG.md # remove existing
openspawn init # scaffold fresh from default template
# or from a template:
openspawn init --template=incident-response

Symptom: openspawn start — port already in use

Section titled “Symptom: openspawn start — port already in use”

Cause: The MCP server defaults to port 3456. Something else is listening on that port.

Symptom in logs: Error: listen EADDRINUSE: address already in use :::3456

Fix:

Terminal window
# Find what's using the port:
lsof -i :3456
# Kill it:
kill -9 <PID>
# Or start on a different port:
openspawn start --port 3457
# Then update your MCP client config to point to the new port.

Symptom: openspawn start — sandbox dashboard port conflict

Section titled “Symptom: openspawn start — sandbox dashboard port conflict”

Cause: The sandbox/dashboard server defaults to port 3333 (SANDBOX_PORT env var).

Fix:

Terminal window
lsof -i :3333
kill -9 <PID>
# or:
SANDBOX_PORT=3334 openspawn start

Symptom: openspawn commands fail with "Cannot find .openspawn directory"

Section titled “Symptom: openspawn commands fail with "Cannot find .openspawn directory"”

Cause: The .openspawn/tasks.json store is missing. This happens if you cloned/copied the ORG.md without running openspawn init, or if .openspawn/ was deleted.

Fix:

Terminal window
# Re-initialize the state directory (non-destructive — skips if ORG.md exists):
openspawn init
# Or create it manually:
mkdir -p .openspawn
echo '{"version":1,"tasks":[],"budgets":{}}' > .openspawn/tasks.json

Symptom: openspawn validate reports "Agent reports to unknown agent"

Section titled “Symptom: openspawn validate reports "Agent reports to unknown agent"”

Cause: The Reports to value must exactly match the name portion of another agent’s heading (the part before ). Typos, different casing, or a mismatch with an agent defined in a different section will cause this.

Fix:

### Alice — CEO
- **Reports to:** Human Principal ← top of hierarchy uses this literal string
#### Bob — Engineer
- **Reports to:** Alice ← must match exactly: "Alice" (not "alice" or "Alice — CEO")

Symptom: openspawn validate reports "No top-level agent"

Section titled “Symptom: openspawn validate reports "No top-level agent"”

Cause: Every org needs exactly one agent whose reporting chain terminates at Human Principal. Without this, the hierarchy has no root.

Fix:

### CEO — Chief Executive
- **Level:** 10
- **Reports to:** Human Principal ← exactly this string

OpenSpawn’s MCP server supports two transports:

TransportFlagBest for
HTTP (Streamable)(default)Claude.ai projects, Cursor, direct API calls
stdio--stdioClaude Code CLI, embedded tool use

Symptom: Claude Code can’t find the OpenSpawn tools

Section titled “Symptom: Claude Code can’t find the OpenSpawn tools”

Fix — add to Claude Code’s MCP config (~/.claude/mcp_config.json):

{
"mcpServers": {
"openspawn": {
"command": "npx",
"args": ["openspawn", "start", "--stdio", "--dir", "/path/to/my-org"],
"env": {}
}
}
}

Verify the server starts:

Terminal window
npx openspawn start --stdio --dir /path/to/my-org
# Should silently wait for stdin (no output in stdio mode is correct)

Symptom: Cursor can’t connect to OpenSpawn MCP

Section titled “Symptom: Cursor can’t connect to OpenSpawn MCP”

Fix — Cursor uses HTTP transport. Start the server first, then configure Cursor:

  1. Start the server:

    3456/mcp
    openspawn start --port 3456
  2. In Cursor Settings → MCP → Add Server:

    {
    "name": "openspawn",
    "transport": "http",
    "url": "http://localhost:3456/mcp"
    }
  3. Verify the server is reachable:

    Terminal window
    curl http://localhost:3456/health
    # Expected: {"status":"ok","name":"openspawn"}

Symptom: MCP tools return { "error": "ORG.md not found at ..." }

Section titled “Symptom: MCP tools return { "error": "ORG.md not found at ..." }”

Cause: The --dir argument to openspawn start doesn’t point to a directory containing ORG.md.

Fix:

Terminal window
# Confirm ORG.md exists in the target directory:
ls /path/to/my-org/ORG.md
# Start with explicit dir:
openspawn start --dir /path/to/my-org --stdio

Symptom: task_claim returns { "error": "No open task found" }

Section titled “Symptom: task_claim returns { "error": "No open task found" }”

Cause: Either (a) there are no tasks with status open, or (b) the specific taskId you requested doesn’t exist or is already claimed.

Fix:

Terminal window
# Check the task store directly:
cat .openspawn/tasks.json | jq '.tasks[] | {id, status, assignee}'
# Create an open task first:
# (via MCP tool call)
# tool: task_create { "description": "Your task", "assignee": null }

Symptom: delegate returns { "error": "Cannot delegate to agent of equal or higher level" }

Section titled “Symptom: delegate returns { "error": "Cannot delegate to agent of equal or higher level" }”

Cause: Delegation only flows downward. You cannot delegate to an agent with level >= yourLevel.

Fix: Check the org hierarchy and ensure you’re delegating to a lower-level agent. Use org_read to see agent levels:

tool: org_read
# Look at agents[].level in the response

The REST API (api.openspawn.ai or your self-hosted instance) supports three authentication methods. Each has distinct error messages.


HMAC Signature Authentication (agent-to-API)

Section titled “HMAC Signature Authentication (agent-to-API)”

Agents authenticate using four request headers:

  • x-agent-id — the agent’s ID
  • x-timestamp — Unix timestamp in seconds
  • x-nonce — random unique string per request
  • x-signature — HMAC-SHA256 of METHOD + PATH + TIMESTAMP + NONCE + BODY

Signature message format (exact):

{METHOD}{PATH}{TIMESTAMP}{NONCE}{BODY}
# Example:
POSTlists/tasks17096835001a2b3c{"title":"test"}
Error messageCauseFix
"Missing authentication headers: x-agent-id, x-timestamp, x-nonce, and x-signature are all required"One or more HMAC headers are missingInclude all four headers in every request
"Request timestamp outside valid window"Clock skew > ±5 minutesSync your system clock: ntpdate -u pool.ntp.org
"Invalid credentials"Agent ID not found, or signature mismatchVerify AGENT_ID matches the registered agent; recompute signature
"Agent is not active"The agent’s status is not ACTIVE in the databaseCheck agent status via dashboard or API
"Nonce already used"The nonce was replayed within a 10-minute windowGenerate a fresh random nonce for every request

Example: correct HMAC headers (Node.js)

import crypto from 'crypto';
const agentId = process.env.AGENT_ID;
const secret = process.env.AGENT_SECRET;
const method = 'POST';
const path = '/tasks';
const timestamp = String(Math.floor(Date.now() / 1000));
const nonce = crypto.randomUUID();
const body = JSON.stringify({ title: 'My task' });
const message = `${method}${path}${timestamp}${nonce}${body}`;
const signature = crypto
.createHmac('sha256', secret)
.update(message)
.digest('hex');
// Include in request:
headers: {
'x-agent-id': agentId,
'x-timestamp': timestamp,
'x-nonce': nonce,
'x-signature': signature,
'content-type': 'application/json',
}

Error messageCauseFix
"Missing authorization header"No Authorization header sentAdd Authorization: Bearer osp_your_key
"Invalid authorization format"Header present but malformed (wrong scheme/no space)Use exactly Bearer osp_... or ApiKey osp_...
"Invalid API key format"Token doesn’t start with osp_Use a key generated via the dashboard API Keys page
"Invalid or expired API key"Key revoked, expired, or wrong environmentGenerate a new key in the dashboard
"API key missing required scope: {scope}"Key exists but lacks the permission for this endpointRegenerate key with the required scope checked

Error messageCauseFix
"Invalid or expired JWT token"Access token has expired (default 15-minute lifetime)Re-authenticate; the client should auto-refresh using the refresh token
"User not found"Account was deleted after the token was issuedLog out and log back in
"Authentication required. Provide either a valid JWT Bearer token, API key (Bearer osp_...), or agent HMAC signature headers."No auth header at allAdd the appropriate Authorization header

StatusTypical meaning in OpenSpawn
400Malformed request body (Zod validation failed) — check required fields
401Auth failure — see error message for specific reason
403Authenticated but not authorized — wrong org, wrong role, or pre-hook block
404Resource not found — wrong ID, or resource belongs to a different org
409Conflict — e.g., task already claimed by another agent
422Invalid state transition (see Task State Machine)

Symptom: Dashboard shows a blank page / white screen

Section titled “Symptom: Dashboard shows a blank page / white screen”

Cause A: The sandbox server isn’t running or the dashboard can’t reach it.

Fix:

Terminal window
# Check if the sandbox server is up:
curl http://localhost:3333/health
# Expected: {"status":"ok",...}
# If not running, start it:
openspawn start

Cause B: JavaScript error on load. Open the browser console (F12 → Console) to see the actual error.


Symptom: SSE connection drops repeatedly / dashboard events stop updating

Section titled “Symptom: SSE connection drops repeatedly / dashboard events stop updating”

Cause: SSE connections time out after periods of inactivity. Some reverse proxies (nginx, Caddy) have a default timeout that kills idle SSE streams.

Fix for nginx:

location /events {
proxy_pass http://localhost:3333;
proxy_http_version 1.1;
proxy_set_header Connection '';
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 86400s; # 24 hours
chunked_transfer_encoding on;
}

Fix for Caddy (in your Caddyfile):

reverse_proxy /events localhost:3333 {
flush_interval -1
}

The client-side dashboard reconnects automatically — this is cosmetic in most cases, but if events are being lost, check proxy timeout settings.


Symptom: Dashboard auth token expiry — gets logged out every 15 minutes

Section titled “Symptom: Dashboard auth token expiry — gets logged out every 15 minutes”

Cause: The JWT access token lifetime is 15 minutes by default. The dashboard should automatically use the refresh token to obtain a new access token, but if the refresh fails, you’ll be logged out.

Fix:

  • Check if REFRESH_TOKEN_TTL is configured in your API environment (default: 7 days)
  • Ensure cookies are not being blocked — the refresh token is stored as an HttpOnly cookie
  • If self-hosting, make sure the API and dashboard share the same domain so the cookie is sent with requests

Symptom: Agent cards show stale data / task counts don’t update

Section titled “Symptom: Agent cards show stale data / task counts don’t update”

Cause: The dashboard uses SSE for live updates. If the SSE connection is broken, data won’t refresh until reconnect.

Fix: Hard-refresh the page (Ctrl+Shift+R / Cmd+Shift+R). If this is recurrent, check the SSE proxy config above.


OpenSpawn’s tasks follow a strict state machine. Attempting an invalid transition returns HTTP 422.

BACKLOG ──→ TODO ──→ IN_PROGRESS ──→ REVIEW ──→ DONE (terminal)
│ │ │ │
└──→ CANCELLED ←────────┴─────────────┘
BLOCKED ──→ TODO
└──→ IN_PROGRESS

Full transition table:

FromAllowed to values
BACKLOGTODO, CANCELLED
TODOIN_PROGRESS, BLOCKED, CANCELLED
IN_PROGRESSREVIEW, BLOCKED, CANCELLED
REVIEWDONE, IN_PROGRESS, CANCELLED
BLOCKEDTODO, IN_PROGRESS, CANCELLED
DONE(none — terminal)
CANCELLED(none — terminal)

Symptom: 422 Unprocessable Entity with "Invalid transition: {from} → {to}"

Section titled “Symptom: 422 Unprocessable Entity with "Invalid transition: {from} → {to}"”

Cause: The requested status change is not in the table above.

Common invalid transitions and fixes:

You triedErrorFix
BACKLOG → IN_PROGRESSInvalidMove to TODO first, then IN_PROGRESS
DONE → IN_PROGRESSInvalid — DONE is terminalCreate a new task instead
CANCELLED → TODOInvalid — CANCELLED is terminalCreate a new task instead
IN_PROGRESS → DONEInvalidMove to REVIEW first (or BLOCKED)
TODO → DONEInvalidMust pass through IN_PROGRESS and REVIEW

Symptom: Task is stuck in BLOCKED — nothing moves it forward

Section titled “Symptom: Task is stuck in BLOCKED — nothing moves it forward”

Cause: A blocked task requires an agent to explicitly transition it back to TODO or IN_PROGRESS after resolving the blocker.

Fix:

  1. Check the task’s escalation — the escalating agent’s manager should have received it
  2. Resolve the blocker:
    tool: escalation_resolve { escalationId: "esc-123", resolution: "Unblocked — DB creds provided" }
  3. Transition the task back:
    tool: task_update { taskId: "task-007", status: "in-progress" }

Symptom: Task stuck in REVIEW — nobody is reviewing it

Section titled “Symptom: Task stuck in REVIEW — nobody is reviewing it”

Cause: No agent with level >= 6 has claimed the review, or the review assignee is busy/inactive.

Fix: Explicitly assign and escalate:

tool: task_update { taskId: "task-007", assignee: "senior-reviewer-id" }
# If urgent, escalate to the lead:
tool: escalate { taskId: "task-007", reason: "Review overdue", agentId: "my-agent-id" }

Symptom: budget_check returns { "error": "No budget set", "agentId": "..." }

Section titled “Symptom: budget_check returns { "error": "No budget set", "agentId": "..." }”

Cause: No budget limit has been configured for this agent. The budget system is opt-in — agents without a configured budget can spend freely.

Fix — set a budget limit via MCP tool:

tool: budget_spend is blocked until a limit is set via setBudgetLimit
# Use the API or dashboard to configure a per-agent limit,
# or add it to ORG.md:
## Policies
### Budget
- **Per-agent limit:** 50

Symptom: budget_spend returns { "ok": false, "remaining": 2.50, "entry": {...} }

Section titled “Symptom: budget_spend returns { "ok": false, "remaining": 2.50, "entry": {...} }”

Cause: The requested spend exceeds the agent’s remaining budget. Spending is blocked when amount > (limit - spent) and the limit is greater than 0.

Fix:

  1. Check remaining balance: tool: budget_check { "agentId": "my-agent" }
  2. If you need more budget, update the limit via the dashboard or API
  3. In ORG.md, increase Per-agent limit:
    - **Per-agent limit:** 200
  4. If an agent legitimately needs more than its limit, it should escalate with reason "budget exceeded" rather than silently stopping

Symptom: Department budget cap exceeded — agents can’t be hired

Section titled “Symptom: Department budget cap exceeded — agents can’t be hired”

Cause: The departmentCaps policy in ORG.md limits how many agents can exist in a department.

Fix:

## Policies
### Department Caps
- Engineering: max 12 agents ← increase this number
- Operations: max 5 agents

Or fire an inactive agent first:

tool: fire { "name": "Inactive Agent Name" }

Symptom: Container exits immediately — docker compose up shows exit code 1

Section titled “Symptom: Container exits immediately — docker compose up shows exit code 1”

Diagnosis:

Terminal window
docker compose logs sandbox

Common causes:

Log messageFix
Cannot find module ...Image build is stale — rebuild: docker compose build --no-cache
ENCRYPTION_KEY not configuredAdd ENCRYPTION_KEY to your .env file
Error: listen EADDRINUSE :::3333Host port 3333 is already in use — change the port mapping in docker-compose.yml

Symptom: API container can’t connect to the database

Section titled “Symptom: API container can’t connect to the database”

Cause: The API is starting before the database is ready, or the DATABASE_URL is wrong.

Symptom in logs: Connection refused or ECONNREFUSED to the database port.

Fix — check your docker-compose.yml has a proper depends_on with healthcheck:

services:
api:
depends_on:
db:
condition: service_healthy # wait for DB health check, not just start
db:
image: postgres:16-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5

Check DATABASE_URL format:

# Correct for Docker Compose (use service name as host):
DATABASE_URL=postgresql://postgres:password@db:5432/openspawn
# Wrong (localhost won't resolve to the DB container):
DATABASE_URL=postgresql://postgres:password@localhost:5432/openspawn

Symptom: Database migrations fail on first start

Section titled “Symptom: Database migrations fail on first start”

Fix:

Terminal window
# Run migrations manually:
docker compose exec api pnpm nx run api:migrate
# Or reset the database and re-migrate:
docker compose down -v # removes volumes — DATA LOSS
docker compose up -d

Symptom: Volume permission errors — EACCES writing to mounted directory

Section titled “Symptom: Volume permission errors — EACCES writing to mounted directory”

Cause: The container runs as a non-root user but the host-mounted volume is owned by root.

Fix:

Terminal window
# Option 1: Fix ownership on the host:
sudo chown -R 1000:1000 ./data
# Option 2: Set the user in docker-compose.yml:
services:
sandbox:
user: "1000:1000"

Symptom: docker compose up works but changes to ORG.md don’t take effect

Section titled “Symptom: docker compose up works but changes to ORG.md don’t take effect”

Cause: When using SANDBOX_READONLY=1, the server serves a fixed snapshot and ignores file changes. Without READONLY, the file is read on each request but only if it’s mounted correctly.

Fix — verify the volume mount:

services:
sandbox:
volumes:
- ./my-org:/app/org:ro # mounts your local org/ directory
environment:
- SANDBOX_READONLY=0 # allow live reloading

  1. Run openspawn validate — catches hierarchy and format errors
  2. Check the FAQ for common Q&A
  3. Search GitHub Issues
  4. Open a new issue with:
    • Your Node.js version (node --version)
    • The exact error message
    • Relevant section of your ORG.md (redact sensitive info)
    • Output of openspawn validate