Page
Choir - Agent Orchestration System
Version: 1.0
choir is an LLM agent orchestration system targeting personalized agents
while providing infrastructure-grade architecture. It follows a philosophy
of deterministic outer control plane + stochastic inner cognition: the
LLM proposes, the control plane decides.
See repo here: choir.
Table of Contents
- 1. Deployment Targets
- 2. Scope and Non-Goals
- 3. Core Design Principles
- 4. Design Invariants
- 5. Architecture Overview
- 6. System Components
- 7. Agent Execution Model
- 8. Tool System
- 9. Resource Lock Model
- 10. Skill System
- 11. Memory Architecture
- 12. Communication Protocol
- 13. Crash Recovery and Replication
- 14. Secrets and Credentials
- 15. Container Architecture
- 16. Dynamic Configuration and Secret Apply
- 17. User Commands
- 18. Networking and Web Access
- 19. Gateway (Telegram)
- 20. Self-Evolution Workflow
- 21. Inference Provider
- 22. Observability and Audit
- 23. Choird Data Model (Postgres)
- 24. Implementation Phases
1. Deployment Targets
- Single Linux host. No distributed consensus, no multi-node coordination.
- Orchestrator-agent communication via Unix domain sockets (UDS) for local deployment, with TCP/HTTP as a built-in second transport for future EKS-style deployments.
- OpenAI-compatible chat completions API for text generation (via OpenRouter or any compatible endpoint); ElevenLabs API for text-to-speech (configurable voice profiles). LLM models must support structured tool calling.
- Postgres + pgvector for long-term memory and archival storage.
2. Scope and Non-Goals
2.1 In Scope (v1)
- Single Linux host deployment.
- One control plane daemon on host (
choird). - One user control client (
choirctl). - One unified runtime daemon in container (
choir-agent). - Two execution lanes inside the same logical agent:
- Edge lane (fast, user-facing).
- Core lane (deeper async reasoning).
- Skill-based multi-phase orchestration with per-state tool allowlists.
- Tool-mediated workspace and side-effect access.
- Tool locking with
FREE | S | Xstates and atomic lockset semantics. - Host-owned durable state and crash recovery.
- Postgres + pgvector memory subsystem integrated into
choird. - Manual approval path for capability-changing actions.
- Dual transport support: UDS (primary) and TCP/HTTP (scaffolding for future multi-node / EKS-style deployments).
- Gateway: Telegram via multiple bot instances. Each agent is bound to a named DM. Admin DMs have full choirctl-equivalent command access; regular DMs can only affect their bound agent.
- Git-managed source repos: tools, skills, identity files (
USER.md,SOUL.md,SOUL-CORE.md), and agent Dockerfiles are versioned in git repositories (global + per-agent). Each agent receives a git identity.
2.2 Out of Scope (v1)
- Multi-host orchestration (but transport layer scaffolds for it).
- Kubernetes/EKS-native deployment (but TCP/HTTP transport is implemented to ease future migration).
- Multi-tenant isolation model.
- Hot mutation of structural runtime identity (
/choircontents, tool binaries, skill definitions). - Unlimited multi-agent graph orchestration (fixed to edge/core dual-lane model).
- Multi-channel gateway (Slack, Discord, web UI, etc.). Telegram only in v1.
3. Core Design Principles
- LLMs are proposal engines, not authorities.
- All world mutations happen through tools.
choirdis the only host authority.- Container runtime is disposable; host state is authoritative.
- Structural mutations require approval + rebuild + redeploy.
- Dynamic reload only applies to data that is safe to reload.
4. Design Invariants
- Core lifetime is a strict subset of Edge lifetime – mechanically enforced, not advisory. Each core job’s lifetime is a subset of the edge session.
- Mutual exclusion on each resource – multiple core jobs may hold exclusive locks on different resources concurrently. The same resource cannot be exclusively locked by more than one owner.
- Injection is append-only – never mutates history, never resets budgets.
- True log is immutable and authoritative – the arbiter’s in-memory
log is the runtime authority; Postgres
session_eventsis the durable replica. Compacted memory is a cache/view. - No hidden recursion – explicit states, explicit transitions, no graph DSL.
- Core never sends raw chain-of-thought – only workflow summaries.
- Agent identity is unified – Edge and Core are the same agent to the user.
- Container is disposable – restart is cheap, workspace is non-authoritative, secrets are ephemeral.
- LLM proposes, control plane decides – all side effects serialized through the arbiter.
- Keep it boring – single host, single DB, minimal components, strong outer boundary.
- At most one active instance per agent – choird rejects
agent startif the agent is already running. No concurrent sessions for the same agent configuration.
5. Architecture Overview
User(s)
| (multiple Telegram bots, multiple DMs)
v
choird (host)
|-- gateway (multi-bot Telegram, DM routing, admin/regular permissions)
|-- control plane (lifecycle, policy, approvals)
|-- memory module -> Postgres(+pgvector, per-agent schema)
|-- embedding client -> OpenRouter embedding API
|-- browser worker (Playwright, host-side, per-agent contexts)
|-- search client (Brave API)
|-- log manager -> .choir.d/logs/
|
|-- [transport: UDS ~/.choir.d/socks/choird.sock OR HTTP :9400]
|
v
Docker container
|-- choir-agent (single Go binary, single process)
|-- Edge lane (goroutine, fast model, user-facing)
|-- Core lane (goroutine, flagship model, deep reasoning)
|-- Arbiter (serializes all committed side effects)
|-- Lock manager
|-- Secret store (in-memory only)
|-- Tool executor
|-- choird RPC client (UDS or HTTP, selected at startup)
|
|-- /choir (read-only, image-baked identity + tools)
|-- /workspace (writable, bind-mounted persistent directory)
| +-- .choirtmp/send/ (agent -> choird file staging)
| +-- .choirtmp/recv/ (choird -> agent file staging)
5.1 Trust Boundaries
Three real trust boundaries:
- Host <-> Container (strong): namespaced root, no
--privileged, no Docker socket, no host PID namespace. - Choird <-> Agent RPC (policy): lease-scoped authentication, rate limits, payload size caps.
- Tool lock system (serialization): resource-level mutual exclusion.
Core security invariant: even a malicious, root-level container cannot cause host-level side effects without explicit human approval.
6. System Components
Four components:
| Component | Location | Role |
|---|---|---|
| choird | Host daemon | Control plane: Telegram gateway (multi-bot, multi-DM routing), container lifecycle, tool/skill registry authority, policy enforcement, session leasing, approval pipeline, memory/embeddings module, Postgres connection pooling. |
| choirctl | Host CLI | Stateless admin interface to choird. Approve proposals, manage sessions, inspect logs, add tools/skills, control policies. |
| choir-agent | Docker container (single Go binary) | Unified cognition + syscall runtime. Manages state machines, edge/core lanes, skills, LLM requests, tool execution, lock manager, secret store, choird RPC client. |
.choir.d/ |
Host filesystem | Choird state directory. Contains config.json (static resource config), secrets.json (authoritative secrets), local clone cache (repos/), logs, and user-local sockets (socks/). |
6.1 Why a Single Binary in the Container
The agent loop lifecycle is fully contained within the member lifecycle: starts after init, ends before shutdown. The container boundary is the sandbox; internal process-level separation adds complexity without meaningful security gain. Root + bash are allowed inside the container. This is a single-tenant disposable cognitive appliance.
Internal Go structure (logical separation preserved):
type Agent struct {
arbiter *Arbiter // serializes all committed side effects
llm *LLMEngine
skills *SkillEngine
tools *ToolExecutor
locks *LockManager
secrets *SecretStore
rpc *ChoirdClient
}
6.2 choird Internal Module Structure
choird/
control_plane/
agent_lifecycle/
memory/
embeddings/
approvals/
gateway/
Memory is folded into choird (not a separate service) because on a single host with a single operator, an extra service adds complexity without adding meaningful isolation. Memory is an extension of control-plane state – it interacts with agent lifecycle, heartbeat, secrets, workspace identity, and approval flow.
choird startup sequence (verbose logging at each step):
- Read and validate
.choir.d/config.json. - Read and validate
.choir.d/secrets.json. - Connect to Postgres using credentials resolved from the configured secret reference.
- Initialize per-agent schemas and roles if they do not exist.
- Initialize connection pool.
- Detect orphaned Docker containers from a previous run (by label
choir.managed=true). Remove them. - Clean up stale leases in Postgres (sessions marked active but whose containers no longer exist). Mark as crashed, release resource leases.
- Start all gateway bot instances (connect to Telegram Bot API).
- Begin accepting
choirctland gateway commands. - Log “choird ready” with config version and summary of loaded resources.
First-time setup (choirctl init):
- Creates
.choir.d/directory structure. - Generates a skeleton
config.jsonwith placeholder values. - Generates a skeleton
secrets.json(authoritative secret store, mode0600). - Creates user-local runtime directories (including
.choir.d/socks/). - Prints instructions: configure Postgres, create the database, set secrets, configure at least one gateway bot, one DM, and one agent.
- Does NOT create Postgres schemas (the user must ensure the database exists; choird creates per-agent schemas on first startup).
Database migrations are deferred in v1. The user manages schema changes
manually. choirctl should support migration commands in a future
version.
6.3 .choir.d/ Configuration Directory
.choir.d/
config.json # static resource configuration (see below)
secrets.json # authoritative secret values (strict JSON, mode 0600)
repos/ # choird-managed local clone cache
global/ # clone of global repo
agents/
<agent-id>/ # clone of per-agent repo
socks/ # user-local unix sockets
choird.sock # default choird UDS socket
logs/ # structured log files
choird.log # active choird log
<agent-id>.log # active per-agent log
archive/ # compressed archived logs (preserving structure)
Tools, skills, identity files, and Dockerfiles are not stored as loose
files in .choir.d/. They live in git repositories, whose local clones
are cached under .choir.d/repos/ (see section 6.5). This keeps all
choir host-side configuration and cached state in one directory.
config.json configures all choird-managed resources:
- Global repo: URL and ref for the shared tools/skills/identity repo.
- Agent definitions: agent ID, per-agent repo URL/ref, resource defaults (references to named workspaces, models, git identities, etc.).
- Workspaces: named workspace definitions with explicit host paths.
- Models: named LLM model/provider pairs with endpoints, request
templates, default parameters (
temperature,reasoning_effort), and secret references. LLM models must support OpenAI-compatible tool calling (see section 21). - TTS providers: named text-to-speech provider configurations with endpoint, model ID, and secret reference.
- Voice profiles: named voice configurations with voice ID, output format, and voice settings (stability, similarity, style, speed).
- Git identities: named identities with name, email, and secret reference.
- Notion integrations: named, with secret reference.
- Email accounts: named, with SMTP/IMAP host and port, sharing mode, and secret reference.
- Search providers: named, with secret reference.
- Embedding model: provider, model name, dimensions, secret reference.
- Gateways: named Telegram bot instances with secret references.
- DMs: named DM bindings (gateway + user ID, admin flag). The set of configured DMs for a bot forms that bot’s allowlist.
- Postgres: connection host, port, admin credentials secret reference.
- Logging: archive thresholds, log retention settings.
- Tunable defaults: budgets, timeouts, heartbeat interval, feature flags.
Secret values are never stored in config.json. Each resource that needs
a secret contains a "secret" field referencing a named secret managed
via choirctl secret set / choirctl secret delete, then synchronized to
running choird/agents via choirctl secret apply.
config.json is the source of truth for static resource configuration.
secrets.json is the source of truth for secret values (v1).
choird reads both at startup. Config changes are applied via the two-phase
choirctl config load / choirctl config apply workflow (see section
17.6); secret changes are applied via choirctl secret apply.
Runtime state (sessions, events, memory) lives in Postgres.
Example config.json:
{
"global_repo": {
"url": "git@github.com:org/choir-global.git",
"ref": "main"
},
"workspaces": {
"main-ws": { "path": "/var/lib/choir/workspaces/main" },
"scratch": { "path": "/var/lib/choir/workspaces/scratch" }
},
"models": {
"sonnet": {
"provider": "openrouter",
"model": "anthropic/claude-sonnet-4-20250514",
"endpoint": "https://openrouter.ai/api/v1",
"temperature": 0.7,
"reasoning_effort": null,
"secret": "openrouter-key"
},
"gpt4o": {
"provider": "openrouter",
"model": "openai/gpt-4o",
"endpoint": "https://openrouter.ai/api/v1",
"temperature": 0.5,
"reasoning_effort": null,
"secret": "openrouter-key"
},
"o3": {
"provider": "openrouter",
"model": "openai/o3",
"endpoint": "https://openrouter.ai/api/v1",
"temperature": null,
"reasoning_effort": "medium",
"secret": "openrouter-key"
}
},
"tts": {
"eleven": {
"provider": "elevenlabs",
"model_id": "eleven_multilingual_v2",
"endpoint": "https://api.elevenlabs.io/v1",
"secret": "elevenlabs-key"
}
},
"voice_profiles": {
"default-en": {
"tts": "eleven",
"voice_id": "...",
"output_format": "opus_48000_128",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.0,
"use_speaker_boost": true,
"speed": 1.0
}
},
"narrator": {
"tts": "eleven",
"voice_id": "...",
"output_format": "mp3_44100_128",
"voice_settings": {
"stability": 0.8,
"similarity_boost": 0.9,
"style": 0.3,
"use_speaker_boost": false,
"speed": 0.9
}
}
},
"git_identities": {
"dev-identity": {
"name": "Dev Agent",
"email": "dev@choir.local",
"secret": "git-dev-token"
},
"ops-identity": {
"name": "Ops Agent",
"email": "ops@choir.local",
"secret": "git-ops-token"
}
},
"notion_integrations": {
"personal-wiki": {
"secret": "notion-personal-key"
},
"work-wiki": {
"secret": "notion-work-key"
}
},
"email_accounts": {
"primary-email": {
"smtp_host": "smtp.example.com",
"smtp_port": 587,
"imap_host": "imap.example.com",
"imap_port": 993,
"sharing": "exclusive",
"secret": "email-primary-creds"
},
"notifications": {
"smtp_host": "smtp.example.com",
"smtp_port": 587,
"imap_host": "imap.example.com",
"imap_port": 993,
"sharing": "shared",
"secret": "email-notify-creds"
}
},
"search": {
"brave": {
"secret": "brave-api-key"
}
},
"embedding": {
"provider": "openrouter",
"model": "text-embedding-3-small",
"dimensions": 1536,
"endpoint": "https://openrouter.ai/api/v1",
"secret": "openrouter-key"
},
"gateways": {
"bot-main": {
"type": "telegram",
"secret": "tg-bot-main-token"
},
"bot-work": {
"type": "telegram",
"secret": "tg-bot-work-token"
}
},
"dms": {
"admin-dm": { "gateway": "bot-main", "user_id": "123456789", "admin": true },
"user-dm": { "gateway": "bot-main", "user_id": "987654321", "admin": false },
"work-dm": { "gateway": "bot-work", "user_id": "123456789", "admin": true }
},
"postgres": {
"host": "localhost",
"port": 5432,
"database": "choir",
"secret": "postgres-admin-creds"
},
"agents": {
"agent-1": {
"repo": {
"url": "git@github.com:org/choir-agent-1.git",
"ref": "main"
},
"defaults": {
"workspace": "main-ws",
"llm": "sonnet",
"voice_profile": "default-en",
"git_identity": "dev-identity",
"notion": "personal-wiki",
"email": "primary-email",
"dm": "admin-dm"
}
}
},
"heartbeat_interval_ms": 5000,
"crash_detection_threshold_ms": 10000,
"rate_limit_retry_ms": 1000,
"log_archive_threshold_lines": 100000
}
All resources are named. Names are stable identifiers used throughout the
system (commands, gateway, config); underlying paths, endpoints, credentials,
and providers can change without affecting references. Every named secret
follows the same pattern: a "secret" field references a named secret
managed via choirctl secret set / choirctl secret delete, with
runtime refresh triggered by choirctl secret apply.
6.4 Resource Allocation Model
Resources are classified by access mode:
Shared resources (any number of agents may use concurrently):
- Models (LLM): API tokens are referenced by name in model definitions. Multiple agents can use the same model concurrently.
- TTS providers: API tokens are referenced by name. Multiple agents can use the same provider concurrently.
- Voice profiles: Named voice configurations. Multiple agents can use the same voice profile concurrently.
- Search (Brave): Shared API key, stateless.
- Email accounts (when
"sharing": "shared"): Multiple agents may send from the same account.
Exclusive resources (leased to one agent at a time):
- Workspaces: Host directories; see section 15.4.
- Git identities: Name, email, and auth credentials. Leased so commits are unambiguously attributable.
- Notion integrations: Per-agent scoped. One agent per integration at a time.
- Email accounts (when
"sharing": "exclusive"): One agent at a time. - DMs: Each agent gets exclusive access to its bound DM. One agent per DM at a time. See section 19.
- Browser contexts: Each running agent gets an isolated Playwright
browser context with exclusive r/w access to its tabs. Managed by the
host-side browser worker, not configured in
config.json(automatically created per agent).
Leasing rules (applies to all exclusive resources):
- choird grants leases at
agent start. If any requested exclusive resource is already leased to another running agent, the start is rejected. - Leases are released when the agent’s session ends (stop, crash, or terminate).
- Every exclusive resource has a default defined in the agent’s
defaultsblock. All defaults are overridable at start time viachoirctl agent startflags or/startarguments.
Singleton constraint: At most one instance of a given agent
configuration may be active at any time (Design Invariant 11). choird
enforces this at agent start.
6.5 Git-Managed Source Repos
All tools, skills, identity files (USER.md, SOUL.md, SOUL-CORE.md),
and agent Dockerfiles are versioned in git repositories. This provides
change tracking, reproducible builds, and a clean self-evolution workflow.
Global repo (one per choir installation):
choir-global/
tools/
<tool-name>.json # tool manifest
<tool-name>/ # tool source directory (if compiled)
skills/
<skill-name>.json # skill spec
identity/
USER.md # shared user identity
SOUL.md # default edge lane personality
SOUL-CORE.md # default core lane personality
Dockerfile.base # base image: OS packages, choir-agent binary
Per-agent repo (one per agent):
choir-agent-<id>/
tools/ # agent-specific tools (additive or override)
<tool-name>.json
<tool-name>/
skills/ # agent-specific skills (additive or override)
<skill-name>.json
identity/
SOUL.md # agent-specific edge personality (overrides global)
SOUL-CORE.md # agent-specific core personality (overrides global)
Dockerfile # FROM choir-base; agent-specific system deps + tool builds
Identity merge rule: USER.md comes from the global repo only (shared
user identity across all agents). SOUL.md and SOUL-CORE.md each come
from the per-agent repo if present, otherwise fall back to global.
Tool/skill merge rule: Agent-specific tools and skills are merged onto global ones. If an agent tool has the same name as a global tool, the agent version overrides. This is resolved at build time, not runtime.
Dockerfile roles:
Dockerfile.base(global repo): Defines the base image – OS packages (git,openssh-client,python3-minimal, etc.), thechoir-agentbinary, and shared infrastructure. Rebuilt only when global repo changes.Dockerfile(per-agent repo):FROM choir-base:<version>. Installs agent-specific system dependencies and compiles tool binaries from source directories. Does not copy tools/skills/identity (the build pipeline handles that separately before the Docker build).
Per-agent git identity: Each agent leases a named git identity from
the git_identities pool in config.json (see section 6.4). The identity
provides user.name, user.email, and auth credentials. Set in the
container’s git config at startup. See section 14.4 for auth details.
Local clone cache: choird maintains clones of all repos under
.choir.d/repos/. These are fetched during config load and used during
agent build. The operator never edits these clones directly – they are
managed by choird.
6.6 Optional Host Workers
choird may manage host-side workers for heavyweight integrations (e.g.
Playwright browser worker), while keeping policy and authority in choird.
7. Agent Execution Model
7.1 Two Cognitive Lanes
The system is NOT two separate agents. It is a single logical agent with two concurrent cognitive lanes sharing identity, tools, skills, and long-term memory:
- Edge Lane: Small, cheap, fast model. Acts as “mouth and ears” –
handles user interaction, routing decisions, real-time responsiveness.
Edge receives
USER.md+SOUL.mdin its system prompt. Edge decides to offload complex tasks to core when it judges the task is complex or when the user explicitly requests it. When spawning core, edge curates a clean task briefing: strips user chattiness, extracts the actionable request, and bundles only the context core needs. Edge can spawn multiple concurrent core jobs, each with a user-friendly name. While cores run, edge remains conversational but is restricted to read-only tools and can still spawn additional cores. - Core Jobs: Flagship reasoning model. Each core job acts as “deep
brain controlling limbs” – handles multi-step planning, precision
tool orchestration, deep reasoning. Core receives
SOUL-CORE.md(notUSER.mdorSOUL.md). Never sees raw user messages; only the curatedCoreJobStartfrom edge. Each core job has a name for easy reference (e.g., “refactor-auth”, “write-tests”). Multiple core jobs can run concurrently on different files without conflict; they are serialized only when accessing the same resource (e.g., both runningchoir.exec).
The async model should feel like “being able to chat with or interrupt a worker while it’s working.”
User message queue: Incoming user messages are queued by the arbiter.
Edge drains the queue only when it returns to IDLE state. Messages that
arrive while edge is in REASONING, WAITING_TOOL, WAITING_CORE, or
FINALIZING wait in the queue. They are not batched – each queued
message triggers a separate IDLE -> CONTEXT_READY -> … cycle. The
/inject <message> gateway command provides an alternative: it injects a
message into the edge lane’s context at the next safe point, similar to
how edge injects instructions into core (see 12.8).
Core completion report: When core returns to idle (CORE_COMPLETED or CORE_TERMINATED), edge must present a summary report of the core job’s outcome to the user. This is a mandatory edge behavior, not optional.
7.2 Edge Lane State Machine
States:
EDGE_IDLE
EDGE_CONTEXT_READY
EDGE_REASONING
EDGE_WAITING_TOOL
(EDGE_WAITING_CORE removed -- edge stays responsive while cores run)
EDGE_FINALIZING
EDGE_TERMINATED
Transitions:
IDLE -> CONTEXT_READY on UserMessage
CONTEXT_READY -> REASONING (inject memory context, call LLM)
REASONING -> FINALIZING on Finish (emit response, persist memory)
REASONING -> WAITING_TOOL on ToolProposal
REASONING -> REASONING on choir.core.spawn (core starts in background)
WAITING_TOOL -> REASONING on ToolResult
FINALIZING -> IDLE after result delivered
REASONING -> FINALIZING on LLM error (emit error message to user, then IDLE)
Any -> TERMINATED on Cancel, BudgetExceeded, fatal error
While any core job is running, edge remains conversational but is
restricted to read-only tools (choir.fs.read, choir.fs.search,
choir.web.search, choir.memory.query) and choir.core.spawn (to
start additional cores). Edge can inject instructions to a specific
core by name, cancel a specific core by name, and list active cores.
Edge is forbidden from side-effect tools (choir.fs.write,
choir.exec) while any core is running.
7.3 Core Job State Machine (per-job)
States:
CORE_IDLE
CORE_CREATED
CORE_INITIALIZING
CORE_REASONING
CORE_WAITING_TOOL
CORE_COMPLETED
CORE_TERMINATED
Transitions:
IDLE -> CREATED on CoreJobStart from edge
CREATED -> INITIALIZING (load task, context, set budgets)
INITIALIZING -> REASONING (LLM call)
REASONING -> COMPLETED on FinalResult
REASONING -> WAITING_TOOL on ToolProposal
REASONING -> REASONING on InstructionInjected
REASONING -> TERMINATED on LLM error (error surfaced to edge via CoreEvent)
WAITING_TOOL -> REASONING on ToolResult
Any -> TERMINATED on Cancel, Error, BudgetExceeded
Each core job is a separate goroutine with its own state machine. The
job is identified by a user-friendly name (assigned by edge or the user).
Lock ownership uses core:<job-name> so multiple cores can hold locks
on different resources concurrently. When a core job completes or
terminates, edge presents a summary to the user (mandatory).
7.4 Arbiter (Event-Sourced Commit Log)
The arbiter is a dedicated goroutine within the choir-agent process. It
owns the single authoritative commit log. Edge and core lanes submit
proposals to the arbiter via Go channels; the arbiter validates, acquires
locks, executes tools (or delegates to choird via EXECUTE_HOST_TOOL),
commits results, and releases locks. No lane can commit a side effect
without going through the arbiter.
Authoritative event types:
UserMsg
ModelOutput(edge/core)
ToolCallRequested
ToolCallCommitted
ToolResultCommitted
MemoryWriteCommitted
CoreStarted / CoreStopped
InjectedInstruction
Cancelled
Only the arbiter appends “Committed” events. Agents only emit proposals.
Commit model: the arbiter commits events to a local in-memory append log
(the primary authority during runtime). Heartbeat replication asynchronously
sends committed events to choird, which persists them to session_events
in Postgres. On crash, unreplicated events are lost; recovery resumes from
the last host-acknowledged revision. This trades durability for latency –
no database round-trip per tool execution.
Two-phase tool execution:
- Propose (agent emits structured tool call)
- Commit (arbiter validates, acquires locks)
- Execute (tool runs)
- Commit result (arbiter records result to local log, releases locks)
v1 tradeoff: the arbiter is fully serial. If a host-delegated tool
(EXECUTE_HOST_TOOL) takes seconds (e.g. Playwright browsing), the other
lane’s tool proposals queue behind it. This is accepted for v1 simplicity;
async pipelining of host tools is a future optimization.
7.5 Lifetime Containment
Core lifetime is a strict subset of Edge lifetime – mechanically enforced:
- Process tree: Core is spawned as child of Edge. If Edge dies, Core dies.
- Lease TTL: choird only accepts requests with valid edge session lease.
- Job token binding: Core job tokens are derived from edge lease and expire sooner.
7.6 Budget Enforcement
Edge enforces (session-level):
max_core_jobstotal_session_tokensmax_tool_calls_per_session
Core enforces (job-level):
per_job_max_stepsper_job_max_tool_callsper_job_wall_time
Budgets are never reset by injection.
7.7 LLM Call Error Handling
When an LLM API call fails (rate limit, 5xx, timeout, malformed response, content filter rejection):
- The error is surfaced as a message to the user via the bound DM.
- The session continues. Edge handles the error by transitioning through FINALIZING (emit error message) back to IDLE, ready for the next user message. Core terminates on LLM error (REASONING -> TERMINATED), surfacing the error to edge via CoreEvent; edge then presents the error to the user.
- No automatic retry of the failed LLM call. The user can re-trigger by sending a new message (edge) or by spawning a new core job.
Rate limit handling (provider 429 responses):
- choir-agent parses the
Retry-Afterheader from the provider response. - On the first rate-limited call, choir-agent immediately sends an informational message to the user’s DM (via choird) and logs the event.
- choir-agent waits for the
Retry-Afterduration (or the configuredrate_limit_retry_msdefault if no header, default: 1000ms), then retries the call. - If the retry also fails, the error is surfaced to the user per the standard error handling above.
7.8 Safe Points
Committed transitions happen only at safe points:
- Completed LLM call parse/validate.
- Completed tool result commit.
- Skill state transition commit.
8. Tool System
8.1 Design Principle
The only way a model can affect or observe the workspace is through tools.
No implicit context, no hidden mounts, no backdoors.
OS analogy:
| OS Concept | Choir Concept |
|---|---|
| User process | LLM lane |
| Kernel | choir-agent runtime |
| Syscalls | Tools |
| Filesystem | Workspace |
| Scheduler | Arbiter |
| Process memory | Compacted working memory |
8.2 Tool Invocation Contract
- Model must use structured tool calling (out-of-band tool channel).
- No regex/text parsing for tool calls.
- Tool names are canonical and namespaced (e.g.
choir.fs.read). - All tool inputs validated against schema before execution.
8.3 Tool Definition (Dual View)
Every tool has two representations:
LLM view (what the model sees):
{
"name": "choir.fs.write",
"description": "Write content to a file in the workspace.",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string" },
"content": { "type": "string" },
"mode": { "type": "string", "enum": ["overwrite", "append"] }
},
"required": ["path", "content"]
}
}
Runtime view (internal metadata):
{
"llm": { "name": "...", "description": "...", "input_schema": "..." },
"runtime": {
"exec_path": "/choir/tools/global/git_commit",
"timeout_ms": 15000,
"locks": [
{ "resource": "workspace", "mode": "X" }
],
"network": false,
"secret_resources": ["git_identity"],
"side_effect": "write",
"idempotent": false,
"version": "1.0"
}
}
The LLM never sees lock semantics, security policies, resource keys, or
internal implementation. Only the llm section is sent to model APIs.
8.4 Tool Taxonomy
Read-only: choir.fs.read, choir.memory.search, choir.repo.status
– safe to parallelize, require S locks.
Write/Mutating: choir.fs.write, choir.repo.commit,
choir.memory.upsert – require X locks.
External Side Effects: choir.http.post, choir.email.send,
choir.web.browse – require X locks + audit.
8.5 Tool Output Handling
Return structured JSON, not raw logs:
{
"status": "success",
"summary": "...",
"artifact_ref": "hash123"
}
Per-tool output tagging controls visibility:
exposure: edge | core | both | noneprompt_mask: true/false
8.6 Tool Surface Minimization
Group logically rather than exposing many granular tools. Example: instead
of choir.memory.get, choir.memory.get_by_hash, choir.memory.scan, expose:
{
"name": "choir.memory.query",
"parameters": {
"mode": ["semantic", "hash", "range"]
}
}
8.7 Built-In Tools (v1)
| # | Name | ID | Lock | Host? | Notes |
|---|---|---|---|---|---|
| 1 | Shell | choir.exec |
workspace:X |
no | Arbitrary commands; secret_resources: [] |
| 2 | Read File | choir.fs.read |
file:<path>:S |
no | Supports chunking via head/tail |
| 3 | Edit File | choir.fs.write |
file:<path>:X |
no | Patch-based (replace range, append, insert at line) |
| 4 | Ripgrep | choir.fs.search |
file:<path>:S |
no | rg binary shipped in image; locks target file/dir |
| 5 | TTS | choir.tts.speak |
choirtmp:X |
no | TTS via agent’s voice profile; writes audio to .choirtmp/send/ |
| 6 | Brave Search | choir.web.search |
none | no | Brave Search API (structured results) |
| 7 | Browse | choir.web.browse |
browser_tab:X |
yes | Playwright via host EXECUTE_HOST_TOOL |
| 8 | Notion | choir.notion.query |
none | no | Notion API integration |
| 9 | Email Send | choir.email.send |
none | no | SMTP email send |
| 10 | Email Receive | choir.email.receive |
none | no | IMAP fetch; returns message list/content |
| 11 | Email Check | choir.email.check |
none | no | IMAP check for new messages; returns count/summary |
| 12 | Memory Query | choir.memory.query |
none | yes | Search working/session/knowledge via host EXECUTE_HOST_TOOL (see 11.7) |
| 13 | Memory Write | choir.memory.upsert |
none | yes | Write to knowledge store only via host EXECUTE_HOST_TOOL (see 11.7) |
| 14 | Memory Compact | choir.memory.compact |
none | no | Force reference summary update for calling lane (see 11.3) |
8.8 Tool Registration
Registry sources:
- Built-in tools (runtime-backed Go implementations).
- External tools defined by manifest + executable in
/choir/tools/....
On-disk structure for external tools:
/choir/tools/
global/ # baked into image, shared
foo.json # tool manifest
foo # executable
agent/ # agent-specific tools
bar.json
bar
No dynamic runtime installs. Tool and skill registries are immutable at runtime.
Tool loading startup sequence:
- Load built-in tools (Go functions)
- Scan
/choir/tools/global - Scan
/choir/tools/agent - Validate JSON schemas
- Verify executables exist
- Build registry
- Register LLM-facing schemas (stripped of runtime metadata)
- If any JSON invalid -> fail startup (fail fast)
Skill loading startup sequence:
- Scan
/choir/skills/ - Parse each
.jsonfile against SkillSpec schema - Validate state machines: all transitions reference valid states, terminal states exist, no unreachable states
- Validate
allowed_toolsreferences against the tool registry - Build skill registry
- If any JSON invalid or state machine ill-formed -> fail startup
Built-in and external tools share a unified Go interface:
type Tool interface {
Name() string
Schema() JSONSchema
Execute(ctx context.Context, input json.RawMessage) (json.RawMessage, error)
}
8.9 Tool Execution Pipeline
- Send request (system prompt + messages + tool schemas) to LLM.
- Model responds with text OR structured tool call(s).
- Arbiter receives tool calls; for each:
- Validate schema.
- Enforce path constraints, lease, resource locks.
- Acquire locks.
- Execute tool.
- Log event.
- Release locks.
- Send tool result back to model.
- Call model again.
8.10 Meta-Tools and Approvals
Capability-changing requests (tool addition/change, skill change, image
update) are proposal-only and require manual approval via choirctl or
gateway command path.
Two classes of tools:
Runtime tools: Execute immediately (governed by locks).
Control plane tools (meta): choir.propose.tool,
choir.propose.skill, choir.propose.config_change – never execute
immediately, never hold locks,
always require manual approval.
Proposal pipeline:
LLM -> propose_tool_change
-> choir-agent logs proposal
-> choird stores pending proposal
-> human approval (via choirctl or gateway)
-> choird mutates registry
Two-phase tool registration:
- LLM proposes metadata (name, description, schema, intended behavior).
- Human writes implementation, reviews schema, registers via
choirctl.
The project ships with tool-builder and skill-builder skills (see
section 10.3) that guide the agent through the proposal process as a
structured multi-phase workflow.
Hard prohibitions: LLM never injects executable code. No auto-apply.
9. Resource Lock Model
9.1 Lock States
Per resource key:
FREES_LOCKED(count)– shared/read, multiple holdersX_LOCKED(owner)– exclusive/write, single holder
Locks apply to tools only, not skills.
Transition table:
| From | To | Condition |
|---|---|---|
| FREE | S_LOCKED | acquire S |
| FREE | X_LOCKED | acquire X |
| S_LOCKED | S_LOCKED | acquire S (additional reader) |
| S_LOCKED | FREE | last S released |
| X_LOCKED | FREE | X released |
No S-to-X upgrades in v1.
9.2 Resource Key Namespaces
workspace -- global per leased workspace (coarse-grained)
file:<path> -- per-file lock (fine-grained, under workspace)
mem:<workspace> -- memory namespace (reserved, no v1 users)
repo:<workspace> -- repo operations (reserved, no v1 users)
browser_tab -- browser context
choirtmp -- .choirtmp/ staging area (independent of workspace lock)
net:<service> -- network services (reserved, no v1 users)
ext:<provider> -- external side effects (reserved, no v1 users)
Lock hierarchy: workspace is a coarse-grained lock that dominates
all file:<path> locks. When workspace:X is held (e.g. by
choir.exec), no file:<path> lock can be acquired – reads and writes
to individual files are blocked. When only file:<path> locks are held,
workspace:X must wait for all file locks to be released.
.choirtmp/ is exempt from the workspace lock hierarchy. It has its
own independent choirtmp lock. This allows gateway file transfers
(inbound uploads, outbound multimedia) to proceed without conflicting
with workspace-level tool execution. Tools that write to .choirtmp/send/
acquire choirtmp:X; choird reads from .choirtmp/send/ and writes to
.choirtmp/recv/ outside the agent’s lock manager (choird operates on
the bind mount directly).
workspace, file:<path>, browser_tab, and choirtmp are actively
used by v1 tools. The remaining namespaces are retained for future tool
additions.
9.3 Atomic Lockset Requirement
Tool invocation must acquire all required locks atomically (all-or-nothing). Release is atomic for the full lockset.
9.4 Concurrency Model
The lock manager supports edge + N concurrent core jobs:
- One mutex.
- One condition variable.
- Canonicalized lockset checks under the same critical section.
Lock ownership uses the lane identifier: edge or core:<job-name>.
Multiple core jobs can hold exclusive locks on different resources
simultaneously (e.g., core:refactor-auth holds file:src/auth.go:X
while core:write-tests holds file:src/auth_test.go:X).
No partial lock holding while waiting.
Properties:
- Deadlock-free: Atomic acquisition means no partial holding.
- Dirty-read exception: Edge may acquire
file:<path>:Swhile any core job holdsworkspace:X. This allows edge to read individual files (and answer user questions) during core execution. Edge may observe partially-modified state. - Writer-preference: If any waiter requests X, block new S grants (except the dirty-read exception above).
- Release on all exits: normal completion, error, timeout,
cancellation, panic (via
defer). - Crash safe: If choir-agent crashes, locks vanish with the process.
9.5 Default Policy
choir.execalways acquiresworkspace:X. No command parsing, no read-only override. This blocks all concurrent file-level operations.choir.fs.readandchoir.fs.searchacquirefile:<path>:S(per-file shared lock). Multiple concurrent reads to different files are allowed.choir.fs.writeacquiresfile:<path>:X(per-file exclusive lock). Blocks other reads/writes to the same file but not to different files.- Tools with no lock requirement (e.g.
choir.web.search,choir.email.send) acquire no locks and never block.
10. Skill System
10.1 Skill Definition
Skills are deterministic orchestration state machines. They do NOT own locks, commit tools, or mutate workspace directly – they only propose.
Formal definition: Skill = (States, Transitions, Guards, Policies)
SkillSpec schema:
{
"name": "build_feature",
"description": "Implement a feature end-to-end.",
"initial_state": "understand",
"input_schema": "<JSONSchema>",
"output_schema": "<JSONSchema>",
"states": {
"understand": {
"objective": "Clarify and restate the requirement.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "plan" }
]
},
"plan": {
"objective": "Produce an implementation plan.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "modify" },
{ "on": "revise", "to": "understand" }
]
},
"modify": {
"objective": "Apply changes to files.",
"allowed_tools": ["choir.fs.read", "choir.fs.write"],
"transitions": [
{ "on": "complete", "to": "validate" }
]
},
"validate": {
"objective": "Verify correctness.",
"allowed_tools": ["choir.fs.read"],
"transitions": [
{ "on": "complete", "to": "done" },
{ "on": "fail", "to": "modify" }
]
},
"done": { "terminal": true }
},
"max_steps": 20,
"interruptible": true
}
10.2 Skill Execution
Each LLM call receives: global context + compacted memory + current skill step objective + step-specific context. Not the whole skill, not all steps – just one phase.
Model proposes: tool call, transition event, or finish. Arbiter validates allowed tools, validates transition, applies state change, logs event.
Key constraints:
- One active skill per lane at a time.
- Skills must not spawn other skills recursively.
- Step context must be structured (not free text accumulation).
- LLM cannot jump to arbitrary states, call forbidden tools, or skip transitions.
Constraint violation handling: when the LLM proposes a tool not in
allowed_tools or an invalid transition, the arbiter rejects the proposal
and returns a structured error to the LLM with the list of allowed tools
and valid transitions for the current state. The LLM retries with corrected
output. A per-step retry budget (default: 2 retries) prevents infinite
correction loops; exceeding it triggers a skill-level error transition.
10.3 Built-In Skills
The project ships with two built-in skills for self-evolution:
tool-builder: Guides the agent through proposing a new tool.
{
"name": "tool-builder",
"description": "Design and propose a new tool for the agent.",
"initial_state": "understand",
"states": {
"understand": {
"objective": "Clarify what the tool should do, its inputs/outputs, and side effects.",
"allowed_tools": ["choir.memory.query", "choir.fs.read"],
"transitions": [{ "on": "complete", "to": "design" }]
},
"design": {
"objective": "Draft the tool manifest: LLM view (name, description, input schema) and runtime view (lock requirements, secret_resources, timeout, side_effect classification).",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "validate" },
{ "on": "revise", "to": "understand" }
]
},
"validate": {
"objective": "Review the manifest for correctness, check for conflicts with existing tools, and verify the schema is well-formed.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "propose" },
{ "on": "fail", "to": "design" }
]
},
"propose": {
"objective": "Submit the tool proposal via choir.propose.tool for human approval.",
"allowed_tools": ["choir.propose.tool"],
"transitions": [{ "on": "complete", "to": "done" }]
},
"done": { "terminal": true }
},
"max_steps": 15,
"interruptible": true
}
skill-builder: Guides the agent through proposing a new skill.
{
"name": "skill-builder",
"description": "Design and propose a new skill state machine.",
"initial_state": "understand",
"states": {
"understand": {
"objective": "Clarify the workflow the skill should orchestrate, its phases, and expected outcomes.",
"allowed_tools": ["choir.memory.query", "choir.fs.read"],
"transitions": [{ "on": "complete", "to": "design" }]
},
"design": {
"objective": "Draft the SkillSpec: states, transitions, per-state objectives, allowed tools, input/output schemas, and max_steps.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "validate" },
{ "on": "revise", "to": "understand" }
]
},
"validate": {
"objective": "Verify the state machine is well-formed: all transitions reference valid states, terminal states exist, allowed tools are registered, and no unreachable states.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "propose" },
{ "on": "fail", "to": "design" }
]
},
"propose": {
"objective": "Submit the skill proposal via choir.propose.skill for human approval.",
"allowed_tools": ["choir.propose.skill"],
"transitions": [{ "on": "complete", "to": "done" }]
},
"done": { "terminal": true }
},
"max_steps": 15,
"interruptible": true
}
Both skills produce proposals that go through the approval pipeline (see section 8.10). The human reviews and either approves (choird registers the new tool/skill, rebuilds the agent image) or rejects. The agent never directly installs tools or skills.
10.4 Hierarchical State Machine Composition
Session SM
+-- Lane SM (edge/core)
+-- Skill SM
+-- LLM step (stochastic proposal)
11. Memory Architecture
Each agent has four memory stores. Session-derived memory (tiers 1-3) is automated by choird. Knowledge (tier 4) is explicitly managed by the agent or operator.
11.1 Memory Overview
| Tier | Name | Location | Lifecycle | Vectorized |
|---|---|---|---|---|
| 1 | Working memory | choir-agent process (choird-snapshotted) | Current session | No (in-memory) |
| 2 | Mid-term memory | Postgres | Last N sessions (default 10) | Yes |
| 3 | Long-term memory | Postgres | Older sessions | Summaries only |
| 4 | Knowledge | Postgres | Persistent, agent-managed | Yes |
11.2 Access Control
| Operation | Scope |
|---|---|
| Write working memory | Own session only (automatic via arbiter) |
| Read working memory | Own session only |
| Write mid-term / long-term | Own agent only (choird-automated) |
| Read mid-term / long-term | Any agent’s |
| Write knowledge | Own agent only |
| Read knowledge | Any agent’s |
Enforced in choird’s EXECUTE_HOST_TOOL handler: writes check
agent_id == caller, reads allow any target_agent.
11.3 Tier 1: Working Memory
Lives in choir-agent process memory. Per-lane (edge and core each maintain their own view of the same session events).
A. Event window – the last N committed events (full payloads with hash references). Injected directly into the LLM prompt. N is configurable (default ~50 events). When the window fills, oldest events roll off.
B. Per-lane reference summary – a mutable structured document summarizing everything that has rolled off the event window. Updated via LLM-generated structured deltas. Edge and core maintain separate summaries; they see the same events but summarize from their role’s perspective.
Reference summary schema:
{
"summary": "...",
"facts": [
{ "key": "...", "value": "...", "source_event": "ev-hash-123" }
],
"referenced_sessions": ["session-abc"],
"referenced_events": ["ev-hash-001", "ev-hash-002"]
}
Memory delta schema (for updating the reference summary):
{
"memory_delta": {
"mode": "append | overwrite",
"summary_update": "...",
"add_references": ["hash"],
"remove_references": ["hash"],
"add_fact": { "..." },
"remove_fact": { "..." }
}
}
Structured deltas prevent total corruption via free-form overwrite.
LLM prompt structure – differs by lane:
Edge lane:
[system prompt] -- edge instructions + skill summaries
[USER.md] -- user identity (NEVER compacted)
[SOUL.md] -- edge personality (NEVER compacted)
[lane_reference_summary] -- materialized summary of compacted events
[event window] -- last N events (compactable content only)
[current user message] -- NEVER compacted
Core lane:
[system prompt] -- core instructions + skill summaries
[SOUL-CORE.md] -- core personality (NEVER compacted)
[lane_reference_summary] -- materialized summary of compacted events
[event window] -- last N events (compactable content only)
[CoreJobStart] -- curated task briefing from edge
The system prompt for each lane contains:
- Lane-specific behavioral instructions (edge: user interaction, routing, injection; core: deep execution, tool orchestration).
- A summary list of available skills (name + description for each).
- Tool schemas are NOT included in the system prompt – they are passed
via the OpenAI-compatible
toolsparameter in the API request.
Compactable content (eligible for rolling off the event window into
the reference summary): UserMsg events (except the most recent),
LLM outputs (ModelOutput), and tool outputs (ToolResultCommitted).
These are the bulk of context growth.
Never compacted (always present in full, excluded from the event window token budget):
- System prompt (lane instructions + skill summaries).
- Identity files (
USER.md+SOUL.mdfor edge;SOUL-CORE.mdfor core). - The most recent
UserMsgevent (the message currently being responded to). Older user messages ARE compactable. CoreJobStart(core lane only: the task briefing from edge).
For older in-session context beyond the event window, the agent uses
choir.memory.query with store: "working" to search the full in-memory
event log by keyword or hash reference.
Compaction triggers (reference summary update):
- Automatic: when the compactable content in the event window exceeds
a configurable token/byte threshold (set in
.choir.d/config.json, default: 80% of the model’s context window minus the non-compactable content). The runtime measures after each LLM response or tool result. - Manual: via
choirctl session compact <session-id>,/compact(gateway), or thechoir.memory.compacttool (agent-invoked).
When compaction fires, the oldest compactable events in the event window are folded into the reference summary via LLM-generated structured deltas. The event window shrinks; the reference summary grows. Non-compactable content is untouched. Compaction runs asynchronously within the lane – the lane continues processing while the compaction LLM call is in flight. The updated reference summary is swapped in atomically when the compaction call completes. If the compaction call fails, the event window retains its current contents and compaction retries on the next trigger.
Persistence: choird snapshots both lanes’ working memory (reference
summaries + event window boundaries) via heartbeat replication. On crash
recovery, reference summaries are restored from snapshot; the event
window is rebuilt from the session_events tail.
11.4 Tier 2: Mid-Term Memory
Stored in Postgres. Contains the last N sessions (N configurable, default 10) with full events chunked and vectorized.
When a session ends (graceful stop), choird:
- Chunks the session events (already in
session_events) into logical blocks (by tool sequence, skill phase, or fixed size). - Embeds each chunk via the embedding pipeline.
- Stores in
memory_documentswithtier = 'mid_term'.
Raw session_events rows are retained indefinitely alongside the
memory_documents chunks (redundant but preserves full granularity for
replay and audit).
Searchable via choir.memory.query with store: "session", mode
semantic or text. Returns chunks from any of the agent’s last N
sessions. Cross-agent queries also hit this tier.
11.5 Tier 3: Long-Term Memory
Stored in Postgres. Contains sessions that have aged out of mid-term. Only summaries are vectorized; full event detail is retained but not indexed for vector search.
When a session ages out of mid-term (session count exceeds N):
- The agent generates a session summary as the last step of graceful shutdown (LLM call, structured output, included in the final heartbeat). This keeps choird inference-free.
- choird chunks the summary into logical partitions (per-skill, per-topic, or per-time-block).
- Embeds the summary chunks and stores as
tier = 'long_term_summary'(vectorized, searchable). - Marks the session’s mid-term event chunks as
tier = 'long_term_detail'– kept but vectors dropped from the HNSW index (saves index size). - Raw
session_eventsrows remain in Postgres (unchanged).
Default semantic search (via store: "session") hits mid-term chunks +
long-term summaries. To drill into a specific long-term session’s full
events, the agent must reference a session_id explicitly via
choir.memory.query with mode session_detail, which triggers text
search against long_term_detail chunks for that session only.
11.6 Tier 4: Knowledge
Stored in Postgres. Separate from session-derived memory. Not automated
by choird. Explicitly managed by the agent via choir.memory.upsert or
by the operator via choirctl.
Stores persistent facts, user preferences, domain notes, reference material, project context – anything not tied to a specific session’s event stream.
Supports insert, update-by-key (optional dedup key), and delete.
Vectorized and searchable via choir.memory.query with
store: "knowledge".
11.7 Memory Tool Surface
| Tool | store |
Modes | Notes |
|---|---|---|---|
choir.memory.query |
working |
keyword, hash reference | Current session in-memory log. Own agent only. |
choir.memory.query |
session |
semantic, text, session_detail |
Mid-term + long-term summaries. session_detail requires session_id for long-term drill-down. target_agent defaults to self, can be any agent. |
choir.memory.query |
knowledge |
semantic, text |
Knowledge base. target_agent defaults to self, can be any agent. |
choir.memory.upsert |
knowledge |
insert, update-by-key, delete | Own agent’s knowledge store only. |
choir.memory.compact |
working |
(triggers compaction) | Forces reference summary update for the calling lane. |
11.8 Session Shutdown Memory Pipeline
During graceful stop (choirctl agent stop / agent update):
- Agent completes current safe point.
- Agent generates session summary (LLM call, structured output). Timeout: if the LLM call does not complete within 30 seconds (configurable), the agent skips the summary and proceeds with shutdown. A missing summary means the session will not have a long-term summary when it ages out of mid-term (mid-term chunks are still generated from raw events).
- Agent includes summary in final heartbeat payload (if generated).
- Agent flushes all unreplicated events to choird.
- choird persists everything to Postgres.
- choird chunks and embeds session events into
memory_documents(mid-term). - If mid-term session count exceeds N, the oldest session is promoted:
summary chunks become
long_term_summary(vectorized), event chunks becomelong_term_detail(vectors dropped).
11.9 Embedding Backend
Integrated into choird (not separate service in v1):
- Postgres for durable text and metadata.
- pgvector for embedding search.
- Embedding calls via configurable provider endpoint.
Embedding configuration: The embedding model is configured in
.choir.d/config.json, not hardcoded. Default: text-embedding-3-small
(1536 dimensions) via OpenRouter. The vector column dimension is derived
from config (vector(N) where N matches the configured model’s output).
Swapping models requires a re-embedding migration but no code change.
Embedding pipeline:
- Batch: up to 32 chunks per API call.
- Queue: embeddings from
choir.memory.upsertand session close chunking are queued and flushed on heartbeat tick or when queue reaches batch size. - Retry: exponential backoff with jitter, max 3 retries (1s/2s/4s base).
- Graceful degradation: on final failure, store chunk without vector and log warning. Never block tool execution on embedding failure. Chunks without vectors are still searchable via full-text search (tsvector).
Design rule: keep memory logic modular inside choird with a clean
MemoryStore interface for future extraction if needed.
12. Communication Protocol
12.1 Dual Transport Model
Both transports are implemented in v1. UDS is the default for single-host deployment; TCP/HTTP is scaffolding for future EKS-style multi-node deployments.
| Transport | When | Auth | Notes |
|---|---|---|---|
| UDS (default) | Single-host, container on same machine | Lease token + file permissions (chmod 600) |
Lowest latency, no port conflicts, impossible to expose externally |
| TCP/HTTP | Future multi-node, or when container runs on a remote host | Lease token + mTLS (or signed request headers / JWT) | Required for EKS pods talking to a control-plane service |
Both transports carry the same logical messages with the same semantics. The choice is a deployment-time configuration decision, not an application-logic decision.
UDS details: Socket path ~/.choir.d/socks/choird.sock on host,
bind-mounted as /run/choir.sock inside container. choird listens,
choir-agent connects. UDS is bidirectional: choird can also write to the
socket to push signals (cancel, injection, config change notifications)
to the agent.
TCP/HTTP details: choird exposes an HTTP endpoint (e.g.
https://choird.internal:9400). choir-agent connects with lease token in
Authorization header. For EKS, this becomes a Kubernetes Service
endpoint. TLS is required when traversing a network boundary; plaintext
is acceptable only over loopback.
HTTP wire format: JSON-over-HTTP. No gRPC or protobuf in v1.
- Request/response:
POST /rpc/<verb>withContent-Type: application/json. Common envelope fields (request_id,session_id,lease_token) in HTTP headers; verb-specific payload in the body. - Server-push streaming: SSE (
GET /events?session_id=...,text/event-stream). Agent opens afterINIT_HELLO. choird pushes config change notifications, injection signals from gateway, and cancel signals. Agent sends heartbeats as regular POST requests. - No bidirectional streaming needed: agent-to-choird is request/response, choird-to-agent is SSE.
12.2 Transport Abstraction
All protocol logic is transport-agnostic. Transport selection happens once at startup; business logic never branches on transport type.
Agent-side interface:
type ControlPlane interface {
Heartbeat(ctx context.Context, req HeartbeatReq) (HeartbeatResp, error)
RequestApproval(ctx context.Context, req ApprovalReq) (ApprovalResp, error)
GetSecrets(ctx context.Context, req SecretReq) (SecretResp, error)
Terminate(ctx context.Context, req TerminateReq) error
}
Server-side interface (choird):
type ControlPlaneHandler interface {
HandleHeartbeat(ctx context.Context, req HeartbeatReq) (HeartbeatResp, error)
HandleApproval(ctx context.Context, req ApprovalReq) (ApprovalResp, error)
HandleSecrets(ctx context.Context, req SecretReq) (SecretResp, error)
HandleTerminate(ctx context.Context, req TerminateReq) error
}
Mountable on UDS listener, HTTP server, or (future) gRPC server.
Runtime toggle:
# Single-host (default)
choir-agent --control-plane=uds --socket=/run/choir.sock
choird --transport=uds
# Multi-node / EKS scaffolding
choir-agent --control-plane=http --endpoint=https://choird.internal:9400
choird --transport=http --listen=:9400 --tls-cert=... --tls-key=...
12.3 Protocol Design Rules
- Include request ID, session ID, and auth token in logical headers on every call, regardless of transport.
- Messages are stateless; never rely on connection state.
- Use a structured schema (protobuf or strict JSON schema) that is transport-independent. The same schema definition generates both UDS and HTTP serialization.
- Transport branching only in client factory / server bootstrap code, never in business logic.
12.4 Authentication Per Transport
| Transport | v1 Auth | Future Auth |
|---|---|---|
| UDS | Lease token (env var at container start) + socket file permissions | Same |
| HTTP | Lease token in Authorization header + server-only TLS (self-signed OK for single-host) |
mTLS (self-signed CA or cert-manager), ServiceAccount identity, JWT |
The lease token is generated by choird at container creation, passed as an env var, and required in every RPC call. It is scoped to a single session. The token’s lifetime matches the agent’s lifecycle: it is valid while the agent is active and revoked when the session ends (graceful stop or crash). There is no time-based expiry – token validity is tied to session liveness, not a clock.
No mTLS in v1: v1 is single-host only, so the HTTP transport runs over
loopback or a local Docker network. Lease token + server-only TLS is
sufficient. mTLS (mutual TLS with per-container client certificates) is
deferred to future EKS work, where containers run on remote nodes and
stronger identity verification is needed. The ControlPlane interface
does not change; only the TLS config factory needs updating.
12.5 Minimum RPC Verbs (choir-agent -> choird)
INIT_HELLOGET_SECRETSHEARTBEATREQUEST_APPROVALREPORT_STATUSTERMINATE_SELFFETCH_DYNAMIC_CONFIGEXECUTE_HOST_TOOL
No RPC verb can directly perform host-destructive operations. The verb set is identical across both transports.
Schema source of truth: Go struct definitions with JSON tags. No protobuf in v1. All requests carry a common envelope:
{
"request_id": "uuid",
"session_id": "...",
"lease_token": "...",
"verb": "HEARTBEAT",
"payload": { ... }
}
Per-verb request/response schemas:
| Verb | Request Payload | Response Payload |
|---|---|---|
INIT_HELLO |
{ agent_id, session_id, image_version, tool_manifest_hash, skill_manifest_hash } |
{ status, lease_token, resource_bindings, snapshot?, tail_events[]?, config_version } |
GET_SECRETS |
{ resources[] } |
{ secrets: map[string]string } |
HEARTBEAT |
{ base_rev, new_rev, patches[], config_version, hash_prev, hash_new, timestamp } |
{ ack_rev, config_version_latest } |
REQUEST_APPROVAL |
{ request_type, payload (jsonb) } |
{ approval_id, status (pending\|approved\|rejected) } |
REPORT_STATUS |
{ lane, state, skill?, step?, budget_remaining } |
{ ack } |
TERMINATE_SELF |
{ reason } |
{ ack } |
FETCH_DYNAMIC_CONFIG |
{ current_config_version } |
{ config_version, config (jsonb)? } |
EXECUTE_HOST_TOOL |
{ tool_name, call_id, input (jsonb) } |
{ call_id, status (success\|error), output (jsonb) } |
EXECUTE_HOST_TOOL is the generic dispatch verb for tools that require
host-side execution. choir-agent sends:
{
"tool_name": "choir.web.browse",
"call_id": "...",
"input": { "url": "...", "mode": "text" }
}
choird dispatches internally to the appropriate host worker (Playwright
for choir.web.browse, Postgres for choir.memory.query, etc.) and
returns the tool result. This covers:
- Browser rendering (Playwright worker)
- Long-term memory operations (Postgres + pgvector)
- Any future host-side tool that cannot run in-container
12.6 Edge <-> Core Communication
Edge and core are goroutines within the same choir-agent process. They communicate via Go channels, not IPC. The arbiter goroutine mediates all committed side effects between them.
Core -> Edge channel: CoreEvent stream (progress, plan, result, error).
Edge -> Core channel: ToolResult, Injection, Cancel.
12.7 Message Contracts
Edge to Core (CoreJobStart):
Core never sees raw user messages. Edge curates a CoreJobStart that
removes user chattiness, distills the request into a clear task, and
includes only the context core needs. This is a key responsibility of the
edge lane – it acts as a filter between the conversational user interface
and the precision execution engine.
{
"job_id": "...",
"job_name": "descriptive-name (e.g. refactor-auth, write-tests)",
"task_spec": "clear, actionable instruction (written by edge, not the user's raw message)",
"context_bundle": {
"facts": ["relevant facts extracted by edge"],
"excerpts": ["file excerpts, prior results, or memory references"],
"constraints": ["user-stated requirements or preferences"]
},
"tool_constraints": { "allowed_tools": [], "time_budget_ms": 0 },
"output_schema": "<JSONSchema>",
"verbosity": "normal"
}
Core to Edge (CoreEvent):
Event types:
progress (phase, percent, current focus)
plan (structured steps)
thought_summary (short reasoning summary, never raw CoT)
tool_proposal (tool, args, why, expected output)
need_info (what's missing)
partial_result (intermediate artifact)
final_result (structured deliverable)
error (structured failure)
12.8 Injection Protocol
Core injection (edge -> core): Edge injects instructions to a
specific core job during CORE_REASONING or CORE_WAITING_TOOL. The
user never injects directly into a core job — they tell edge what they
want, and edge decides whether and how to relay to the appropriate core.
{
"type": "core_injection",
"job_name": "refactor-auth",
"content": "New instructions..."
}
Injection is append-only – never mutates history, never resets budgets. Max injection count enforced. Injection cannot spawn new cores.
Edge injection (user -> edge): The /inject <message> gateway command
injects a message into the edge lane’s context at the next safe point.
This is the mechanism for the user to provide additional instructions
without waiting for edge to drain its message queue (which only happens
at IDLE). The injected message is appended to the edge context, not
queued. There is no user-facing command to inject directly into a core
job — all core communication flows through edge.
13. Crash Recovery and Replication
13.1 Heartbeat Replication
choir-agent sends heartbeats at a configurable interval (default:
5000ms, set via heartbeat_interval_ms in config.json).
Crash detection: choird considers an agent crashed if no heartbeat is
received within crash_detection_threshold_ms (default: 10000ms). On
crash detection, choird performs cleanup:
- Marks the session as crashed in Postgres.
- Releases all resource leases (workspace, git identity, notion, email, DM, browser context).
- Removes the orphaned container.
- Logs the crash event.
- The session is recoverable on next
agent startvia the recovery handshake.
Both heartbeat_interval_ms and crash_detection_threshold_ms are
configurable in config.json. The crash threshold should be at least 2x
the heartbeat interval to avoid false positives.
Each heartbeat carries committed deltas to choird:
{
"session_id": "...",
"agent_id": "...",
"base_rev": "<last host-acknowledged revision>",
"new_rev": "<agent's latest committed revision>",
"patches": ["<ordered, contiguous>"],
"config_version": 41,
"hash_prev": "<chain hash>",
"hash_new": "<chain hash>",
"timestamp": "<for ops>"
}
Committed event types:
LLM_CALL_COMMITTEDTOOL_CALL_COMMITTEDTOOL_RESULT_COMMITTEDSKILL_TRANSITION_COMMITTED
Host acknowledges accepted revision (ack_rev). Host reconstructs state
by folding events. Periodic snapshots bound replay cost.
13.2 Recovery Handshake
On startup:
- Agent sends hello with identity/session context.
- Host responds with canonical snapshot + tail events (or start fresh).
- Agent rehydrates runtime state to host-committed revision.
Recovered state is the last host-acknowledged revision, not mid-step transient state. Events committed locally but not yet replicated via heartbeat are lost on crash. This is the tradeoff of local-first commit (see section 7.4).
13.3 Idempotency Requirements
- Resent deltas for already committed revisions are no-ops.
- Tool side effects must be deduplicable with invocation IDs.
13.4 Working Memory Structure (Snapshotted)
reference_summary_edge (per-lane structured summary, see 11.3)
reference_summary_core (per-lane structured summary, see 11.3)
event_window_boundary (rev of oldest event in window)
skill_state (skill name, node, local ctx)
lane_state (edge/core status + budgets)
open_jobs (core job status, tool in-flight)
14. Secrets and Credentials
14.1 Identity vs Secret Injection
- Non-secret IDs (e.g.
AGENT_ID) may be env-injected at launch. - Secrets are never provided via env vars. Root + bash means
envor/proc/<pid>/environwould reveal them.
14.2 Secret Handshake
- choir-agent boots, connects to choird over the configured transport (UDS or HTTP).
- Sends:
{"type": "INIT", "agent_id": "...", "session_id": "...", "image_version": "..."} - Choird validates agent ID, session, policy. Resolves the agent’s resource bindings (leased workspace, git identity, notion integration, email account, models, voice profile, DM) from the session’s start-time overrides or agent defaults.
- Agent sends:
{"type": "REQUEST_SECRETS", "resources": ["git-dev-token", "openrouter-key", "notion-personal-key"]}(Secret names derived from the"secret"fields of the agent’s bound resources.) - Choird replies:
{"type": "SECRETS", "data": {"git-dev-token": "...", "openrouter-key": "...", ...}} - Agent stores secrets in-memory only; never writes to disk; never logs.
14.3 Secret Handling Rules
- Secrets only in process memory. Atomic swap on update; never mutate in place.
- Never persisted to
/workspaceor logs. - Secret access scoped per tool. A tool’s
secretfield in its runtime manifest references the named secret from the agent’s bound resources (e.g.choir.notion.querycan access the agent’s bound Notion secret;choir.execcannot access anything). - Explicit secret refresh supported without restart (
choirctl secret apply). - On crash, secrets are lost. On restart, handshake repeats.
14.4 Git Identity and Auth
Each agent leases a named git identity (see section 6.4):
user.nameanduser.emailfrom the leasedgit_identitiesentry. Configured in the container’s git config at startup (not env vars).- Git auth credentials referenced by the identity’s
secretfield. Each identity gets its own credential set so commits are attributable. Identities are exclusive – at most one agent may lease a given identity at a time. - Git auth via custom credential helper (
choir-agentbinary detectsargv[0] == "git-cred-helper"and acts as helper). Helper validates remote host before returning token. - SSH private keys injected ephemerally into
ssh-agentmemory viassh-add -on stdin. Allowed hosts enforced via SSH config. - No persistent key files in workspace, env vars, or
.git/config. - Arbitrary git commands allowed.
15. Container Architecture
15.1 Filesystem Layout
/choir (read-only, baked into image)
/tools/
/global/ (shared tool executables + definitions)
/agent/ (agent-specific tools)
/skills/ (skill definitions)
/bin/git-cred-helper (credential helper binary/symlink)
USER.md (user identity -- edge lane only)
SOUL.md (edge lane personality)
SOUL-CORE.md (core lane personality)
version.json (version metadata)
/workspace (writable, bind-mounted, long-lived)
.choirtmp/ (staging area for gateway file transfer)
send/ (agent -> choird: tool outputs, multimedia)
recv/ (choird -> agent: user uploads, inbound files)
Version metadata (/choir/version.json):
{
"agent_id": "...",
"image_version": "abc1234",
"global_repo_commit": "def5678",
"agent_repo_commit": "ghi9012",
"tool_manifest_hash": "...",
"skill_manifest_hash": "..."
}
choird verifies this at container startup to prevent silent drift.
15.2 Container Properties
- Lean image (Debian slim +
git,openssh-client,python3-minimal), restartable, no hot self-update. - Root and shell access inside container are allowed by design.
- Host escalation is prevented at container boundary and RPC surface.
15.3 Container Security Profile
Container constraints:
--cap-drop=ALL--security-opt no-new-privileges--read-only(root filesystem)--tmpfs /tmp(and/run)- cgroup limits: CPU/mem/pids
- Default Docker seccomp
Prohibited:
--privileged- Docker socket mount
--pid=host--network=host- Device passthrough
- Host mounts (except
/workspaceand, when using UDS transport, the control socket)
15.4 Workspace Model
Workspaces are named resources defined in config.json with explicit host
paths. They are leased to agents – only one agent may hold a lease on a
given workspace at a time.
Leasing rules:
- When an agent starts a session, choird grants a workspace lease. The
default workspace is defined in the agent’s config; an alternative can
be specified via
--workspaceflag onagent startor/start. - If the requested workspace is already leased to another running agent, the start is rejected with an error. The operator must stop the other agent or choose a different workspace.
- The lease is released when the agent’s session ends (stop, crash, or terminate).
- Workspace-to-agent binding is per-session, not permanent. The same agent can use different workspaces across sessions.
Workspaces are plain directories on the host filesystem. They are not necessarily git repositories – a workspace may contain a git repo, loose files, or any mix. choird does not manage workspace contents; the agent and operator do.
WS-1: /workspace is non-authoritative and may be wiped at any time.
Durable truth lives in external sources (remote git origins, databases,
etc.) or host control-plane state.
First-class reset: choirctl workspace reset <workspace-name> – deletes
the backing directory contents. If an agent holds a lease on the workspace,
the agent is stopped first.
15.5 Restart vs. Hot Reload
Requires restart (changes what the agent can do):
- Tool registry / executables
- Skill definitions
- Lock policies
- Binary runtime logic
/choircontent (USER.md, SOUL.md, SOUL-CORE.md)- Dockerfiles (base or per-agent)
Hot-reloadable (changes how the agent does it):
- Model list, provider endpoints, inference parameters (temperature, reasoning_effort), request templates, TTS provider settings, voice profiles
- Feature flags
- Secrets (revocable/refreshable without restart)
16. Dynamic Configuration and Secret Apply
16.1 Dynamically Reloadable
- Secrets.
- Model/provider list (LLM and TTS).
- Request templates and voice profiles.
- Tunable defaults and flags.
16.2 Not Dynamically Reloadable (Restart Required)
- Tool registry and executables.
- Skill definitions.
- Runtime binary.
/choiridentity content (USER.md,SOUL.md,SOUL-CORE.md).- Dockerfiles (base or per-agent).
16.3 Sync Mechanism
choird reads .choir.d/config.json and .choir.d/secrets.json at startup.
Subsequent updates are explicit:
choirctl config load– read and validate.choir.d/, stage in choird.choirctl config apply– bumpconfig_version, publish hot-reloadable changes to agents.choirctl secret apply– reload.choir.d/secrets.jsonin choird and publish secret refresh to running agents.
Agents sync via heartbeat:
- Agent reports current
config_version. - Host replies with latest version.
- Agent fetches full new config via
FETCH_DYNAMIC_CONFIGif mismatch. - Agent atomically swaps in-memory config and secret snapshots.
Restart-required changes (tools, skills, identity, Dockerfiles – all
sourced from git repos) are applied via choirctl agent update or
choirctl agent update-all (see sections 17.4, 17.5).
Reload discipline:
- Reload does NOT retroactively affect in-flight tasks.
- Snapshot config at start of tool execution.
- Snapshot model settings at start of LLM call.
- Never partially patch config – version-based full replacement.
17. User Commands
Gateway (Telegram) commands are a subset of choirctl commands. Config
updates are two-phase: load stages into choird, apply publishes
hot-reloadable changes to agents. Restart-required changes use the agent
lifecycle commands (build, update).
17.1 choirctl Command Reference
System setup:
choirctl init # first-time setup: create .choir.d/, skeleton config.json, print instructions (see 5.2)
Agent lifecycle:
choirctl agent init <agent-id> # scaffold new agent with placeholder config (see 17.8)
choirctl agent list # list all agents and status
choirctl agent start <agent-id> [flags] # start agent container (see below)
choirctl agent stop <agent-id> # graceful termination (see 17.2)
choirctl agent restart <agent-id> # stop + start (same image, same resource bindings)
choirctl agent status <agent-id> # detailed status (lane states, budgets, skill, resource bindings, uptime)
choirctl agent build <agent-id> # build new/updated agent image (see 17.3)
choirctl agent update <agent-id> # build (if needed) + stop + start with new image (see 17.4)
choirctl agent update-all # build + redeploy all agents (see 17.5)
agent start flags – every agent default (section 6.4) is overridable:
--workspace=<name> # override default workspace
--llm=<name> # override default LLM model
--voice-profile=<name> # override default voice profile
--git-identity=<name> # override default git identity
--notion=<name> # override default Notion integration
--email=<name> # override default email account
--dm=<name> # override default DM binding (required for choirctl start)
Omitted flags use the agent’s defaults from config.json. When starting
via choirctl, --dm is required (choirctl has no implicit DM context).
When starting via gateway /start, the DM defaults to the one that sent
the command (if in the bot’s allowlist). choird validates that all named
resources exist and that exclusive resources are not already leased before
starting the agent.
Session:
choirctl session list [agent-id] # list active/recent sessions
choirctl session events <session-id> # stream session event log
choirctl session cores <session-id> # list active core jobs (name, state, step)
choirctl session cancel <session-id> # cancel active core job (by name)
choirctl session compact <session-id> # trigger working memory compaction
Model switching (per-agent, hot-reloadable):
choirctl model list # list named LLM models and TTS providers from config
choirctl model get <agent-id> # show current LLM model and voice profile for agent
choirctl model set <agent-id> --llm=<name> # switch text generation model for agent
choirctl voice list # list named voice profiles
choirctl voice get <agent-id> # show current voice profile for agent
choirctl voice set <agent-id> <name> # switch voice profile for agent
Model and voice profile changes are hot-reloadable – the agent picks up
the new settings on its next LLM call or TTS invocation via the heartbeat
config sync (no restart required). <name> refers to a named model or
voice profile defined in config.json. Overrides via model set or
voice set are transient: on agent restart, the agent reverts to its
defaults (or start-time --llm/--voice-profile overrides).
Configuration (two-phase):
choirctl config load # read .choir.d/ into choird, validate, stage
choirctl config diff # show diff: staged vs running (see 17.6)
choirctl config apply # push hot-reloadable changes to running agents
choirctl config show # show current running config
Approvals:
choirctl approval list # list pending approvals
choirctl approval show <id> # show approval detail (what, who, when)
choirctl approval approve <id> # approve
choirctl approval reject <id> # reject
Workspace:
choirctl workspace list # list workspaces, paths, and current lease holder
choirctl workspace reset <workspace-name> # delete contents; stops leasing agent if running
Secrets:
choirctl secret list # list secret names (never values)
choirctl secret set <name> [secret] # set/update a secret value (arg or stdin)
choirctl secret delete <name> # delete a secret
choirctl secret apply # reload secrets.json and push to running agents
Secret values are never stored in config.json. Resources in config.json
reference secrets by name via "secret" fields (e.g.
"secret": "openrouter-key"). Values live in ~/.choir.d/secrets.json
(authoritative in v1), and running agents are refreshed via
choirctl secret apply. choirctl config apply does not reload secret values.
Observability:
choirctl logs <agent-id> # snapshot current log + tail -f stream (see 22.2)
choirctl status # system overview: choird health, Postgres, agents
17.2 agent stop (Graceful Termination)
- choird sends a shutdown signal to the agent.
- Agent completes its current safe point (tool result commit, LLM call finish, or skill transition).
- Agent generates a session summary (LLM call, structured output).
- Agent flushes all unreplicated events and the session summary from its local commit log to choird via a final heartbeat.
- choird persists the session state (events, working memory snapshots, lane states, skill state, budget counters) to Postgres.
- choird chunks and embeds session events into
memory_documents(mid-term). If mid-term session count exceeds N, the oldest session is promoted to long-term (see section 11.8). - Agent calls
TERMINATE_SELFand exits. - Session is recoverable: a subsequent
agent startcan resume from the persisted session via the recovery handshake (INIT_HELLO-> snapshot- tail events).
17.3 agent build
- Reads the agent definition from
.choir.d/config.json. - Pulls/fetches the global repo into
.choir.d/repos/global/(checks out configured ref). - Pulls/fetches the per-agent repo into
.choir.d/repos/agents/<agent-id>/(checks out configured ref). - Builds the base image from
Dockerfile.basein the global repo (cached; only rebuilt if global repo HEAD changed since last build). - Merges artifacts: global tools + agent tools (agent overrides by name),
global skills + agent skills (same),
USER.mdfrom global,SOUL.mdfrom agent repo if present else global,SOUL-CORE.mdfrom agent repo if present else global. - Builds agent image from per-agent
Dockerfile(FROM choir-base): installs agent-specific system packages, compiles tool binaries from source directories undertools/. - Bakes merged tools into
/choir/tools/global/and/choir/tools/agent/, merged skills into/choir/skills/, identity into/choir/USER.md,/choir/SOUL.md,/choir/SOUL-CORE.md, and/choir/version.json(includes both repo commit SHAs) into the image. - Tags the image:
choir-agent-<id>:<git-short-hash>. - Stores in the local Docker image cache only. Remote image registries are out of scope for v1 – the per-agent git repos are the source of truth; images are rebuilt locally as needed.
- Does NOT start or restart the agent. The image is ready for use by
agent startoragent update.
17.4 agent update
- Builds a new image if the staged config or repo commits differ from
the running image (compares
version.jsoncommit SHAs; skips build if already up to date). - Gracefully stops the running agent (same as
agent stop– session persisted to Postgres). - Starts the agent with the new image.
- New container picks up both the new image contents (tools, skills,
identity) AND the latest hot-reloadable config via
INIT_HELLO. - Session resumes from persisted state if applicable.
17.5 agent update-all
- Builds new images for all agents whose staged config or repo commits differ from their current image (skips if already up to date).
- For each running agent: graceful stop (session persisted), start with new image (session resumed via recovery handshake). Agents are updated sequentially, not in parallel.
- For each stopped agent: image is built but the agent is not
started. The new image is ready for the next
agent start. - Reports per-agent results: built, redeployed, skipped, or failed.
17.6 config load / config diff / config apply
config load reads and validates the entire .choir.d/ directory
atomically:
- Parses
config.json. - Fetches latest commits from the global repo and all per-agent repos into
.choir.d/repos/. Reports any fetch failures (unreachable remote, auth failure) as warnings (stale local clones are usable but flagged). - Scans repo contents: validates tool/skill JSON schemas, checks that Dockerfiles parse, verifies source directories exist for compiled tools.
- Reports errors. If any validation fails, nothing is staged.
- On success, choird holds the staged config (including repo commit SHAs) in memory alongside the current running config.
config diff compares staged vs running and categorizes each change:
- Hot-reloadable: secrets, model/provider list, request templates, tunable defaults, feature flags.
- Restart-required: tool additions/removals/changes, skill
additions/removals/changes,
/choiridentity content.
Example output:
~ global:tools/web_browse.json [restart-required] → use: choirctl agent update <agent-id>
+ agent-1:skills/code_review.json [restart-required] → use: choirctl agent build <agent-id>
~ agent-1:identity/SOUL.md [restart-required] → use: choirctl agent update <agent-id>
~ config.json: models.default [hot-reload] → included in: choirctl config apply
global repo: abc1234 -> def5678 [restart-required]
agent-1 repo: 111aaa -> 222bbb [restart-required]
config apply only publishes hot-reloadable changes. It bumps
config_version; agents pick it up via heartbeat. Restart-required
changes are surfaced by config diff and applied via agent build /
agent update / agent update-all at the operator’s discretion.
17.7 Atomicity Guarantees
Load atomicity: config load reads the entire .choir.d/ directory
as one snapshot. Either all files validate and the snapshot is staged, or
none of it is staged. No partial staging.
Apply atomicity (per-agent): Each agent receives the full config as a
single versioned blob via FETCH_DYNAMIC_CONFIG. The agent swaps its
entire in-memory config atomically (pointer swap behind a mutex). There is
no state where an agent runs with half-old, half-new config. The agent
snapshots config at the start of each tool execution and LLM call, so an
in-flight operation completes with the config it started with.
Apply atomicity (cross-agent): NOT atomic across multiple agents. Different agents poll at different heartbeat intervals, so they may run different config versions briefly. In v1 this is moot (single agent), but cross-agent consistency is eventual, not transactional.
Stop atomicity: agent stop guarantees that all locally committed
events are replicated to Postgres before the process exits. If the agent
crashes during shutdown before the final flush, recovery resumes from the
last acknowledged revision (unreplicated tail is lost).
Update atomicity: agent update is stop-then-start, not a rolling
swap. There is a window where the agent is down. The session is fully
persisted before the old container exits and fully restored after the new
container starts. No events are processed during the gap.
Failure during apply: If an agent fails to fetch the new config (RPC
error, timeout), it retries on the next heartbeat and continues running
with its current config. choird logs the discrepancy.
choirctl agent status shows each agent’s current config_version so the
operator can see who is behind.
17.8 agent init
- Creates a new agent entry in
config.jsonwith placeholder values:repo.url:""(must be filled in before build).repo.ref:"main".defaults.workspace:""(must reference an existing workspace).defaults.llm:""(must reference an existing model).defaults.voice_profile:""(optional, references a named voice profile).defaults.git_identity:""(must reference an existing identity).defaults.notion:""(optional).defaults.email:""(optional).defaults.dm:""(must reference an existing DM).
- Initializes a new per-agent git repo (bare) if a local repo path is configured, or prints instructions for creating the remote repo.
- Scaffolds the per-agent repo with skeleton files:
Dockerfile(minimalFROM choir-base).identity/SOUL.md(edge personality placeholder).identity/SOUL-CORE.md(core personality placeholder).tools/andskills/(empty directories with.gitkeep).
- Does NOT build an image or start the agent. The operator must fill in
placeholders, then run
agent buildandagent start.
18. Networking and Web Access
18.1 Network Policy
Arbitrary outbound web requests from container are allowed in v1 by design tradeoff (single-user, user-managed risk).
18.2 Browser Handling
To avoid shipping headless browser in container image:
choir.web.browseis routed through host-side Playwright worker viaEXECUTE_HOST_TOOLRPC tochoird.- Each running agent gets an isolated Playwright browser context, automatically created at agent start and destroyed at agent stop.
- The agent has exclusive read/write access to its own browser tabs. No agent can access another agent’s browser context.
- Browser contexts are not configurable in
config.json– they are ephemeral, agent-scoped resources managed by the browser worker.
18.3 Direct API Integrations
API tools like Notion/TTS/Search/Email run in-container with memory-only
scoped secrets. TTS uses the agent’s bound voice profile (voice ID,
output format, voice settings) resolved from config.json at runtime.
Email uses SMTP for sending and IMAP for receiving (v1 only supports
these protocols; no API-based email providers).
19. Gateway (Telegram)
19.1 Architecture
The gateway supports multiple Telegram bot instances, each with multiple DM conversations. choird owns all bot tokens (managed as secrets, never exposed to containers). The gateway module routes messages between Telegram DMs and the appropriate agent’s edge lane.
Named resources:
- Gateways: Named bot instances in
config.json, each with a secret reference for its bot token. - DMs: Named DM bindings in
config.json, each referencing a gateway (bot) and a Telegram user ID. The set of configured DMs for a bot implicitly forms that bot’s allowlist – messages from unconfigured user IDs are silently ignored. - Admin DMs: DMs with
"admin": truehave full choirctl-equivalent command access across all agents and system resources. - Regular DMs: DMs with
"admin": falsecan only issue commands affecting their bound agent.
Each agent gets exclusive access to its bound DM. One agent per DM at a
time. DM binding is established at agent start (see section 6.4).
Channel capability requirements: Any gateway channel (current or future) must support:
- Text messages: Send and receive plain text.
- File transfer: Send and receive files (documents, archives, etc.).
- Multimedia: Send voice messages, images, and videos.
Telegram satisfies all three natively.
19.2 Message Flow
Telegram -> Bot API (long-poll) -> choird gateway module
-> identify bot instance + user ID -> look up bound DM + agent
-> route to edge lane of bound agent (queued; see 7.1)
-> edge response -> choird gateway -> Telegram reply to DM
choird translates between Telegram message format and the internal
UserMsg event type. Messages from unauthorized users (no matching DM
config) are dropped.
No streaming in v1: All outbound messages are sent as complete blocks. One user message may trigger the sending of multiple response messages (e.g. a text response followed by a file upload), but each message is complete before sending. No incremental token-by-token delivery.
19.3 Gateway Commands
Regular DM commands (available to all configured DMs, scoped to bound agent):
| Command | Equivalent choirctl |
Notes |
|---|---|---|
/status |
choirctl agent status <bound-agent> |
Shows lane states, budgets, skill, uptime |
/stop |
choirctl agent stop <bound-agent> |
Graceful termination (see 17.2) |
/restart |
choirctl agent restart <bound-agent> |
Stop + start, same image |
/cores |
choirctl session cores <active-session> |
Lists active core jobs with name, state, step |
/cancel <name> |
choirctl session cancel <active-session> |
Cancels a core job by name |
/compact |
choirctl session compact <active-session> |
Trigger working memory compaction |
/events |
choirctl session events <active-session> |
Last N events |
/model |
choirctl model get <bound-agent> |
Show current LLM model |
/model llm <name> |
choirctl model set <bound-agent> --llm=<name> |
Switch text generation model |
/voice |
choirctl voice get <bound-agent> |
Show current voice profile |
/voice <name> |
choirctl voice set <bound-agent> <name> |
Switch voice profile |
/inject <message> |
(no choirctl equivalent) | Inject message into edge context at next safe point |
/approvals |
choirctl approval list |
Lists pending approvals for bound agent |
/approve [id] |
choirctl approval approve <id> |
Approve; scoped to bound agent’s approvals |
/reject [id] |
choirctl approval reject <id> |
Reject; scoped to bound agent’s approvals |
Regular DM commands always target the bound agent. No [agent-id]
argument is accepted.
Admin DM commands (in addition to all regular commands):
| Command | Equivalent choirctl |
Notes |
|---|---|---|
/start <agent-id> [key=value ...] |
choirctl agent start <agent-id> [flags] |
Override defaults; DM binding is the triggering DM if in allowlist |
/stop <agent-id> |
choirctl agent stop <agent-id> |
Stop any agent |
/restart <agent-id> |
choirctl agent restart <agent-id> |
Restart any agent |
/update [agent-id] |
choirctl agent update <agent-id> |
Build + stop + start with new image |
/update-all |
choirctl agent update-all |
Build + redeploy all agents |
/config load |
choirctl config load |
Stage config changes |
/config diff |
choirctl config diff |
Show staged vs running diff |
/config apply |
choirctl config apply |
Apply hot-reloadable changes |
/workspace list |
choirctl workspace list |
List workspaces and lease holders |
/workspace reset <name> |
choirctl workspace reset <name> |
Reset workspace |
/secret list |
choirctl secret list |
List secret names |
/agent list |
choirctl agent list |
List all agents and status |
/agent build <agent-id> |
choirctl agent build <agent-id> |
Build agent image |
/approvals |
choirctl approval list |
Lists ALL pending approvals |
/approve [id] |
choirctl approval approve <id> |
Approve any agent’s approval |
/reject [id] |
choirctl approval reject <id> |
Reject any agent’s approval |
Admin DMs accept <agent-id> arguments to target any agent. When
/approve or /reject is sent without an ID and there is exactly one
pending approval, it targets that approval. If multiple are pending, the
bot replies with a numbered list and waits for selection.
19.4 DM Binding
DM-to-agent binding is established at agent start:
- Via gateway
/start: The DM that sends the command is the binding target, provided it is in the bot’s allowlist (configured indms). If the DM is already bound to another agent, the start is rejected. - Via
choirctl agent start:--dm=<name>is required. References a named DM fromconfig.json. - Default: Each agent has a default DM in its
defaultsblock. Used when no explicit override is provided viachoirctl.
The binding is exclusive: one agent per DM at a time. Released when the agent’s session ends (stop, crash, terminate).
19.5 Supported Message Types
Inbound (user -> agent):
- Text messages: Forwarded to edge lane as
UserMsg(queued; drained when edge returns to IDLE). - File/image uploads: choird stores the file to
/workspace/.choirtmp/recv/, then forwards aUserMsgto edge with the.choirtmp/recv/<filename>path as an attachment reference. - Audio messages: Optionally transcribed (future), otherwise stored
as file attachment in
.choirtmp/recv/.
Outbound (agent -> user):
- Text messages: Agent text responses sent as Telegram messages (chunked at 4096 chars). Each message is a complete block.
- Files: Agent writes file to
.choirtmp/send/, tool result contains the path reference. choird fetches from.choirtmp/send/and sends as Telegram document upload. - Voice messages: TTS tool writes audio to
.choirtmp/send/. choird sends as Telegram voice message (OGG Opus format). - Images: Agent writes to
.choirtmp/send/. choird sends as Telegram photo message. - Videos: Agent writes to
.choirtmp/send/. choird sends as Telegram video message.
choird cleans up files from .choirtmp/send/ after successful delivery
and from .choirtmp/recv/ after the agent acknowledges receipt.
19.6 Approval UX
When REQUEST_APPROVAL arrives from the agent:
- choird sends a Telegram message to the agent’s bound DM with the
proposal summary and an inline keyboard:
[Approve][Reject]. Callback data encodesapproval_id. - On approve: choird sends “Approved: [summary]. Executing.” and
notifies the agent with
status: approved. - On reject: choird sends “Rejected: [summary].” and notifies the agent
with
status: rejected. - Unanswered approvals timeout after 30 minutes (configurable in
.choir.d/config.json). On timeout, choird sends “Approval timed out: [summary]. Rejected automatically.” and notifies the agent withstatus: rejected. - Multiple pending approvals each get their own message. No batching – each is independently approvable/rejectable.
/approveand/rejectcommands work as fallback (reply to the approval message) for cases where inline buttons don’t render.- Admin DMs can approve/reject any agent’s approvals. Regular DMs can only approve/reject their bound agent’s approvals.
19.7 Design Constraints
- One agent per DM. No multiplexing.
- choird rate-limits outbound messages to respect Telegram API limits.
- Long responses are chunked (Telegram 4096-char message limit).
- The gateway is a thin adapter; all routing logic lives in choird’s control plane, not in the gateway module.
- For
/update-all(admin only), the operator gets a progress summary as each agent is processed.
20. Self-Evolution Workflow
Agent can propose changes to its own per-agent repo, but cannot apply directly. The workspace does not contain a checkout of the agent’s repo by default – repo interaction only happens when the user explicitly prompts for it.
User-initiated self-evolution flow:
- User asks the agent to modify its own tools/skills/identity.
- Agent clones its per-agent repo into
/workspace(on demand). - Agent makes edits, commits to a proposal branch
(
proposal/<session-id>/<description>) using its git identity. - Agent submits proposal to choird via
REQUEST_APPROVALwithrequest_type: "repo_change"and the branch ref. - On approval: choird merges the proposal branch into the per-agent
repo’s configured ref (e.g.
main), then triggersagent build+agent update. - On rejection: proposal branch is left for manual inspection (not auto-deleted).
- The workspace clone is ephemeral – cleaned up after the proposal is submitted or on workspace reset.
Approval-only proposal flow (no repo clone needed):
- Agent proposes a change via
REQUEST_APPROVALwith a description and diff/spec only (e.g. viachoir.propose.toolorchoir.propose.skill). - On approval, choird clones the per-agent repo, applies the change, commits with the agent’s git identity, pushes, then triggers build + update.
- The agent never directly touches the repo in this path.
No hot self-update of current running image. No Docker socket mount.
21. Inference Provider
21.1 API Compatibility
v1 targets OpenAI-compatible chat completions API only. All LLM
inference goes through endpoints that implement the OpenAI
/v1/chat/completions schema. This includes:
- Native OpenAI API.
- OpenRouter (first-class supported routing layer).
- Any self-hosted or third-party endpoint exposing the same schema (vLLM, Ollama, Together, etc.).
Tool calling requirement: Every LLM model used by choir must support
the OpenAI tool calling interface (tools parameter in the chat
completions request, tool_calls in the assistant response). Models
that do not support structured tool calling cannot be used – choir does
not fall back to text-based tool parsing (see section 8.2).
21.2 Model Configuration
Each named model in config.json supports the following inference
parameters:
temperature(number | null): Sampling temperature. Set tonullfor reasoning models that do not accept temperature (e.g. o1, o3). Default:0.7if omitted.reasoning_effort(string | null): Reasoning effort level for models that support it (e.g."low","medium","high"for OpenAI o-series). Set tonullor omit for models that do not support reasoning effort. Passed as-is to the API.
These are per-model defaults. They are included in the hot-reloadable config – changes take effect on the next LLM call without restart.
21.3 TTS Configuration
TTS is configured separately from LLM models, split into two layers:
TTS providers ("tts" in config.json): Named provider
configurations specifying the API endpoint, TTS model ID, and secret
reference. v1 supports ElevenLabs only.
{
"provider": "elevenlabs",
"model_id": "eleven_multilingual_v2",
"endpoint": "https://api.elevenlabs.io/v1",
"secret": "elevenlabs-key"
}
Voice profiles ("voice_profiles" in config.json): Named voice
configurations referencing a TTS provider and specifying voice-specific
settings:
| Field | Type | Description |
|---|---|---|
tts |
string | Reference to a named TTS provider |
voice_id |
string | ElevenLabs voice ID (from Get Voices endpoint) |
output_format |
string | Audio format: mp3_44100_128 (default), opus_48000_128, pcm_16000, etc. |
voice_settings.stability |
number | Voice stability (0.0-1.0). Lower = more expressive, higher = more consistent |
voice_settings.similarity_boost |
number | Voice similarity (0.0-1.0). Higher = closer to original voice |
voice_settings.style |
number | Style exaggeration (0.0-1.0). 0 = minimal latency |
voice_settings.use_speaker_boost |
boolean | Boost speaker similarity. Increases latency |
voice_settings.speed |
number | Playback speed (0.5-2.0). 1.0 = normal |
Each agent references a voice profile via its defaults.voice_profile.
Voice profiles are shared resources – multiple agents can use the same
profile concurrently. Profile changes are hot-reloadable (takes effect on
the next choir.tts.speak call). When choir.tts.speak executes, it
resolves the agent’s current voice profile, uses the referenced TTS
provider’s endpoint and credentials, and passes the voice settings to
the API.
For Telegram gateway, the output format should produce audio compatible
with Telegram voice messages (OGG Opus). The opus_48000_* formats are
recommended.
21.4 OpenRouter Implications
OpenRouter as a routing layer introduces specific constraints:
- No reliable system-prefix KV caching across requests (OpenRouter is a routing layer, not a model provider).
- Design for full recompute cost per call.
- Keep system prompt small, stable, immutable.
- Put all dynamic content after the system prompt.
- Enforce behavior via state machine constraints, not verbose prompt prose.
22. Observability and Audit
22.1 Telemetry
Minimum required telemetry:
- Session lifecycle events: session start/stop, lane transitions, container start/restart/kill.
- LLM call boundaries: model, token counts, latency, lane origin.
- Tool call traces: proposals, lock acquisition/release, execution results, durations.
- Approval requests and outcomes: who requested, what was requested, approved/rejected, by whom.
- Heartbeat revisions: replication progress, recovery operations, snapshot creation.
- Config/secrets version changes: metadata only (version numbers, timestamps), never secret values.
All telemetry must include session ID and lane ID for correlation.
22.2 Logging
Logs are written to files in .choir.d/logs/, organized by source:
choird.log: Global choird logs (startup, gateway, control plane, Postgres operations, config changes).<agent-id>.log: Per-agent logs (agent lifecycle, LLM calls, tool execution, heartbeat, errors).
Log format: Structured JSON lines. Each line includes timestamp, level, source (choird / agent-id), session ID (if applicable), lane (if applicable), and message.
Archiving:
- Agent logs: Archived when a session ends (graceful stop). The
active log file is compressed and moved to
.choir.d/logs/archive/<agent-id>/<timestamp>.log.gz. - choird logs: Archived when the active log file exceeds a
configurable line threshold (
log_archive_threshold_linesinconfig.json, default: 100000). Compressed and moved to.choir.d/logs/archive/choird/<timestamp>.log.gz.
choirctl log access:
choirctl logs <agent-id> # snapshot current log + tail -f stream
choirctl logs choird # snapshot choird log + tail -f stream
choirctl logs prints the current log file contents (snapshot), then
continues streaming new lines as they are written (like tail -f).
Ctrl+C stops streaming. No filtering or search in v1 – use standard
tools (grep, jq) on the log files directly.
23. Choird Data Model (Postgres)
23.0 Schema Isolation
Each agent gets its own Postgres schema and role within a single
centralized database (configured in config.json under "postgres").
choird manages these schemas and roles:
- On first startup (or
agent init), choird creates a schemachoir_<agent_id>and a rolechoir_<agent_id>with access limited to that schema. - Agents never connect to Postgres directly. All memory operations go
through choird via
EXECUTE_HOST_TOOL(see section 8.7). choird uses the per-agent role when executing queries on an agent’s behalf, ensuring schema-level isolation without giving the agent credentials. - Cross-agent memory reads (section 11.2) go through choird’s
EXECUTE_HOST_TOOLhandler, which uses the admin connection to query across schemas. Agents never directly access another agent’s schema. - Control plane tables (sessions, events, snapshots, approvals) live in
a shared
choir_controlschema owned by choird.
choird maintains a connection pool to Postgres. Per-agent roles are used for agent-initiated queries (via host tool delegation); the pool’s admin connection is used for control plane operations.
Database migrations are deferred in v1. choirctl should support
migration commands (choirctl db migrate, choirctl db backup) in a
future version.
23.1 Control Plane Tables (schema: choir_control)
agents: Agent definitions and metadata.
agent_id, name, image_version, created_at, config (jsonb)
sessions: Active and historical sessions.
session_id, agent_id, lease_id, status, started_at, ended_at,
resource_bindings (jsonb) -- { workspace, llm, voice_profile, git_identity, notion, email, dm }
session_events: Append-only true execution log (see section 11.3). Serves as both the authoritative event history and the crash recovery source.
id, session_id, rev, event_type, lane (edge/core),
payload (jsonb, includes hash refs), created_at
session_snapshots: Periodic state snapshots for bounded replay.
session_id, rev, snapshot (jsonb), created_at
pending_approvals: Queued approval requests.
id, session_id, agent_id, request_type, payload (jsonb),
status (pending/approved/rejected), created_at, resolved_at
Note: resource configuration lives in .choir.d/config.json on the host
filesystem, not in Postgres. Postgres stores only runtime state (sessions,
events, memory, approvals).
23.2 Session-Derived Memory Tables (schema: choir_<agent_id>, Tiers 2-3)
memory_documents: Chunked session events and long-term summaries.
id uuid PRIMARY KEY
session_id text NOT NULL
tier text NOT NULL -- 'mid_term', 'long_term_summary', 'long_term_detail'
chunk_index int NOT NULL
text text NOT NULL
tsv tsvector -- GIN indexed
created_at timestamptz NOT NULL
metadata jsonb -- skill, phase, topic tags
-- agent_id is implicit from schema name (choir_<agent_id>)
memory_embeddings: Vector embeddings for session memory search.
document_id uuid REFERENCES memory_documents(id)
embedding vector(N) -- HNSW indexed; N from embedding model config (default 1536)
-- exists for mid_term and long_term_summary only (not long_term_detail)
23.3 Knowledge Tables (schema: choir_<agent_id>, Tier 4)
knowledge_documents: Agent-managed persistent knowledge.
id uuid PRIMARY KEY
key text -- optional dedup key (e.g. 'user.preference.theme')
text text NOT NULL
tsv tsvector -- GIN indexed
created_at timestamptz NOT NULL
updated_at timestamptz NOT NULL
metadata jsonb -- tags, source, category
-- agent_id is implicit from schema name (choir_<agent_id>)
knowledge_embeddings: Vector embeddings for knowledge search.
document_id uuid REFERENCES knowledge_documents(id)
embedding vector(N) -- HNSW indexed
23.4 Query Patterns
All queries are executed by choird using schema-qualified table names
(e.g. choir_<agent_id>.memory_documents). For own-agent queries,
choird uses the per-agent role; for cross-agent reads, choird uses
the admin connection.
Default semantic search (mid-term + long-term summaries):
SELECT d.*, e.embedding
FROM choir_<target_agent>.memory_documents d
JOIN choir_<target_agent>.memory_embeddings e ON d.id = e.document_id
WHERE d.tier IN ('mid_term', 'long_term_summary')
ORDER BY (1 - (e.embedding <=> $query_vec)) DESC
LIMIT 10;
Long-term drill-down (full session detail, requires session_id):
SELECT d.*
FROM choir_<target_agent>.memory_documents d
WHERE d.session_id = $session_id
AND d.tier = 'long_term_detail'
ORDER BY d.chunk_index;
Knowledge search:
SELECT d.*, e.embedding
FROM choir_<target_agent>.knowledge_documents d
JOIN choir_<target_agent>.knowledge_embeddings e ON d.id = e.document_id
ORDER BY (1 - (e.embedding <=> $query_vec)) DESC
LIMIT 10;
All queries support hybrid scoring (vector + tsvector) when both a semantic query and text query are provided.
24. Implementation Phases
- Phase 1 – Control Plane Protocol: Transport abstraction layer
(ControlPlane interface + ControlPlaneHandler interface), UDS
implementation, container lifecycle, init/secret/config handshake,
heartbeat. Includes
choirctl init,choirctl agent init. - Phase 2 – Single-Lane Tool Loop: Structured tool calling, basic lock manager, tool execution pipeline, skill engine, built-in tools (fs, exec, search). Get deterministic base working.
- Phase 3 – Gateway & User Interface: Telegram gateway (multi-bot,
multi-DM, admin/regular permissions), DM binding, gateway commands,
.choirtmp/file transfer, TTS tool, email tools (SMTP + IMAP). - Phase 4 – TCP/HTTP Transport: HTTP implementation of the same ControlPlane interface, TLS support, lease-token-in-header auth. Validate that all RPC verbs work identically over both transports.
- Phase 5 – Core Lane Async: Add core lane execution, injection/ cancel workflow, event streaming. Test heavily.
- Phase 6 – Crash Recovery: Heartbeat replication, ack protocol, snapshot creation, recovery handshake.
- Phase 7 – Memory Integration: Postgres/pgvector schema, per-agent schema isolation, embedding pipeline, hybrid search, memory compaction.
- Phase 8 – Self-Evolution & Hardening: Approval workflows, self-evolution pipeline (tool-builder/skill-builder skills, repo proposal flow), structured logging and archiving, observability instrumentation, security hardening.