MaestroBot Design | Peisong's Lighthouse

Link to implementation repo.

MaestroBot is a local Linux user-scoped agent service built from three real subsystems:

Maestro as the programmable agent-loop and state-machine substrate
Myria as Global Persistent Memory and the durable event substrate
Go as the host runtime, scheduler, control plane, and tool host

The core design goal is to run a long-lived local agent that behaves like one durable channel-based worker rather than a stateless chat wrapper.

1. Core model
2. Main responsibilities
3. Architectural split
4. Deployment shape
5. Storage layout
6. Maestro program model
7. Channel runtime model
8. Scheduling and residency
9. Prompt and action model
10. Tool model
11. Workspace and VFS model
12. Memory integration
13. MCP integration
14. Control plane
15. Failure model
16. Current implementation notes

1. Core model

MaestroBot is channel-centric.

The primary runtime unit is the channel. Each channel owns:

one logical runtime context
one persistent workspace
one pending message queue
one scheduler priority
one paging identity
one retained subagent set

The system is meant to preserve work and context across idle periods, restarts, and paging boundaries.

An idle channel is not runnable merely because it has a persisted paging snapshot. The host only reactivates a channel when there is real work: pending inbound input, a due wake condition, or an explicit operator wake.

2. Main responsibilities

MaestroBot is responsible for:

normalizing local frontend input into channel messages
scheduling channel work
retaining and paging channel state
exposing and dispatching tools
supervising Myria
exposing a local control plane over Unix IPC
managing external MCP servers

MaestroBot is not responsible for:

owning long-term semantic memory itself
storing provider state outside its own root
acting as a remote multi-user network service

3. Architectural split

There are three logical layers.

3.1 Frontend

Responsibilities:

normalize platform input to internal messages
deliver outbound runtime messages to the platform
emit canonical events to Myria through the host runtime

Built-in frontends currently include:

local CLI
Telegram Bot API

Outbound frontend design is intentionally layered:

the runtime owns canonical authored Markdown
the frontend lowers that Markdown into a platform-safe intermediate form
the frontend renders and delivers the final platform payload

For Telegram, the lowering path is:

Markdown input
Markdown AST
normalized Telegram-safe IR
Telegram renderer

The Telegram renderer prefers text + entities, falls back to HTML parse mode, and finally falls back to plain text chunking.

During an active Telegram-backed channel run, the frontend also emits sendChatAction(action="typing") until the run sends a reply or finalizes.

3.2 Runtime

Responsibilities:

own per-channel queues
own channel scheduling
own paging and residency
own tool dispatch
own provider calls
own subagent lifecycle
act as the kernel for schema execution

3.3 Memory

Responsibilities:

per-channel local transcript and working context
global durable event log
cross-channel and long-horizon retrieval
snapshot/index build workflows

Each channel is a complete local agent instance. It owns its conversation transcript, structured working-memory notebook, current-run history, workspace, queues, and sleep/wake state. The host derives the recent local context from the channel transcript before prompting.

Myria is supervised as Global Persistent Memory. It receives canonical events from the host and provides auxiliary retrieval when the local channel transcript and notebook are insufficient, stale, or too compressed.

4. Deployment shape

MaestroBot is one binary.

Normal invocation modes:

maestrobot --daemon Runs the daemon directly in the foreground.
maestrobot --daemon --debug Runs the daemon with frontend-visible gateway telemetry.
maestrobot ... Runs the local control CLI.
maestrobot daemon start|stop|status|logs Manages a systemd --user service.

Myria is launched by the daemon as a subprocess.

HTTP is not the main control-plane surface. The control path is Unix socket IPC.

Gateway debug mode is operator telemetry. It may send compact tool-call, sanitized-argument, tool-result, prompt-compaction, cumulative-token, and sleep/wake messages through frontends, but those messages are not channel transcript entries and are not appended to Myria as conversation events.

5. Storage layout

Default root:

~/.maestrobot

The root contains at least:

config.yaml
runtime.yaml
state.json
SOUL.md
maestro/
myria/
workspaces/
users/
paging/
transcripts/
logs/

5.1 `config.yaml`

Static operator-authored config.

Contains:

provider definitions
concrete model presets
runtime preset references
Myria configuration references
frontend configuration
external MCP server configuration
onboarding notice and archive policy

5.2 `runtime.yaml`

Mutable desired runtime config.

Contains:

stable internal users
internal user display names

This file is editable by the user and by the control CLI.

5.3 `state.json`

Mutable host-owned state.

Contains:

channels
unknown identities
external account to internal user mappings
retained subagents
tool server state
next-id counters

Unknown accounts are not allowed to enter the agent loop. They receive a deterministic onboarding notice from the runtime/frontend gate up to a configured cap, then are ignored until attached to an internal user.

5.4 `maestro/`

User-editable Maestro agent source and compiled artifact.

This is part of the runtime root on purpose. Users are expected to customize the agent behavior here without patching the repo.

5.5 `workspaces/`

workspaces/<channel-id>/

Each channel gets one persistent host directory.

5.6 `users/`

users/<user-id>/profile.md

Holds concise durable user-profile context. Profiles are bounded by config, injected into the prompt for known channel participants, and updated through a host-validated tool. They should contain stable preferences, constraints, communication style, and durable facts only.

Prompt rendering keeps stable identity, operating rules, mode instructions, and durable user profiles ahead of volatile channel state. The OpenRouter adapter sends that stable prefix as a separate system message and places timestamp, current message, run events, and tool observations in the dynamic user message. This preserves Maestro-owned prompt text while giving provider-side prompt caches a stable prefix.

5.7 `bin/`

Installer-managed runtime binaries.

This includes at least:

bin/maestroc
bin/myria

6. Maestro program model

MaestroBot uses Maestro as the programmable agent-loop substrate.

The runtime root contains:

maestro/channel_loop.mstr
maestro/subagent_loop.mstr
maestro/agent_loop.mstro

The repo ships default templates for the .mstr files, but the runtime does not execute those repo copies directly.

Instead:

maestrobot init seeds root/maestro/
the installer places root/bin/maestroc and root/bin/myria
the daemon compiles root/maestro/*.mstr into a valid artifact
channel and subagent execution run from that compiled root artifact

This makes the Maestro layer an operator-facing customization surface.

The Go runtime still owns:

persistence
IPC
provider access
tool execution
Myria supervision

But the full agent loop, prompt composition, and explicit state sequencing live in the Maestro program.

The Maestro program also declares the host-provided surface it wants for each state. A state requests:

valid state transitions
tool policy, such as @plan, @workspace, @myria, or an exact tool name
context sections, such as agent_memory, recent_channel_context, or last_tool_observation

The Go host treats those declarations as schema-owned policy. It expands tool groups against the currently available runtime catalog, rejects unknown required tools, skips explicitly optional tools written as ?tool.name, and injects only the requested context sections. The host still owns provider calls, validation, tool execution, persistence, and frontends.

7. Channel runtime model

Per-channel state is one of:

current
active
idle

Semantics:

current the foreground channel
active runnable or resumable work exists
idle sleeping, waiting for new work or a wake condition

Wake conditions are explicit runtime events. The host distinguishes:

user-message a new human inbound message
automatic-wake a scheduled wake created by prior channel finalization
runtime-wake an explicit host/operator wake without new human input

The active snapshot retains wake metadata, including the wake source, scheduled wake self-note, next wake time, and the consecutive automatic wake count. A scheduled wake self-note is a private note from the current channel agent to its future run. It should explain what the future agent should remember, inspect, decide, or avoid. Finalization always schedules a future wake. If the only plan is to check whether the user replied, the agent should schedule a long wake and tell its future self not to message the user if nothing changed.

One channel run is one contiguous execution segment from the last finalized boundary to the next finalized boundary.

That same boundary is used by daemon trace streaming.

8. Scheduling and residency

The scheduler is channel-based, not message-based.

Properties:

one pending queue per channel
channel-level priority
priority aging over time
bounded worker concurrency across channels
one run at a time per channel

state.json is the authoritative scheduler record. It owns pending queues, channel state, pause state, priority, wake_at, and timeout_at. Paging snapshots are execution-context records used to resume or reconstruct a run; they are not the source of truth for queue ownership or wake scheduling.

Residency is host-owned:

channel snapshots are host data, not opaque Maestro VM dumps
paging happens only at safe boundaries
LRU-style eviction is used when resident contexts exceed the limit
one Maestro run executes until it reaches idle, then the host may page or reschedule the channel later

On startup, the host reconciles these stores before starting workers. Stale current channels left by a previous process are downgraded to active if their snapshot has unfinished work, or to idle with a scheduled fallback wake if no work remains.

9. Prompt and action model

The agent prompt is not platform-specific.

The runtime and Maestro state machine may ask the agent to emit a user-visible Markdown message, but they do not ask it to emit Telegram-specific entities or HTML.

That platform lowering happens only at the frontend boundary.

Prompt construction is schema-owned, but identity is not.

SOUL.md is the exclusive source for the agent’s identity, name, voice, and standing preferences. The active Maestro program decides where that identity text is injected, but the program itself should remain identity-neutral. Runtime context should provide facts and constraints, not persona.

Each prompt includes:

identity text loaded from SOUL.md
operating and current-mode prompt text embedded directly in the active Maestro state
channel metadata
participant context
current message
bounded local working memory
valid next transitions
recent current-run events
loop warnings derived from repeated action patterns
relevant queue summaries, with follow-up content hidden until consumed

The host runtime supplies those structured sections as data, but the prompt layout and wording are embedded in maestro/channel_loop.mstr and maestro/subagent_loop.mstr.

The local working-memory layer is intentionally modeled as a structured operational notebook rather than a truthful event log. It is designed to hold the agent’s current understanding in compact sections such as user profile, channel facts, active goal, plan, open loops, workspace state, and handoff notes.

The host also retains a shorter-lived current-run-capability view inside the current channel snapshot. That view is inferred from the tools exposed during the current run and exists so the schema can answer capability questions from the broader current run rather than only the current state’s narrowed tool mask.

Each channel is modeled as a complete local agent. The local transcript is the canonical recent conversation source, and the snapshot carries a bounded current-run event stream: user messages, assistant messages, tool calls, tool results, state transitions, wake events, and context updates. The prompt is rendered from the transcript-derived recent context plus the current-run stream, not from a single last-action slot.

Myria remains Global Persistent Memory and the truthful durable event substrate. The agent should prefer the local notebook and recent local transcript context first, and consult Myria only for older, cross-channel, uncertain, stale, or too-compressed recall.

Grounding rules sit above that memory split:

successful tool results are factual observations
failed tool results are also factual observations about what did not work
user-visible replies must not invent successful filesystem, shell, browser, web, or memory results that are not supported by successful tool execution

Each agent step is bounded to exactly one tool/action selection.

Maestro owns the channel run loop itself:

prepare activates or resumes one channel run
plan performs bounded work on the current channel
finalize-send emits the user-visible reply
idle asks the model to choose timeout, sleep, wake self-note, and priority bookkeeping, then signals the host that the run has reached a safe waiting boundary

The host validates and dispatches the selected tool, but the loop shape and state transitions are encoded in the schema.

The schema controls which tools are visible in each step through a small policy language. Current built-in groups are:

@plan
@finalize-send
@idle
@core
@workspace
@web
@image
@browser
@subagents
@myria
@external

Exact tool names may be requested directly. Prefix a tool name with ? when the schema can use it if available but should continue if that tool is absent in the current runtime state.

10. Tool model

The tool plane is unified.

10.1 Built-in runtime tools

internal.* tools include:

queue control
channel finalization
VFS reads/writes/search
user-note writes
shell execution and interactive process sessions
browser automation
web fetch/search
image metadata and OCR
subagent control

10.2 Myria tools

myria.* is query-only from the agent’s perspective.

The agent does not append to Myria directly.

Myria tools expose global persistent memory. They are not the normal path for remembering the immediately preceding same-channel exchange.

10.3 External MCP tools

External MCP servers are registered into the same tool plane and exposed alongside built-ins.

11. Workspace and VFS model

Each channel has a host directory, but tools see a mounted VFS view.

Mount layout:

/ writable channel workspace
/.host-path/<n> read-only mounted host PATH directories

The host runtime owns lease enforcement so overlapping agent activity cannot corrupt the workspace.

Long-running process sessions and browser sessions retain their lease ownership while active.

12. Memory integration

Myria is supervised as a subprocess.

Current default:

file-backed SQLite for convenience in local deployment and testing

Fuller backend:

PostgreSQL remains the richer Myria storage mode

The generated Myria config is derived from MaestroBot config.yaml. In the current MaestroBot version, SQLite is the generated convenience path.

13. MCP integration

MaestroBot supports MCP in three ways:

built-in Myria over stdio
external local MCP servers over stdio
external remote MCP servers over:
- Streamable HTTP
- legacy HTTP+SSE

The control plane supports:

explicit stdio registration
explicit remote registration
manifest discovery
manifest import
enable/disable
inspect
removal

Discovery currently recognizes:

mcp.json
.mcp.json
mcp-server.json
claude_desktop_config.json

Compatibility probing is real: the daemon connects to the server, performs initialize, sends notifications/initialized, and probes tools/list.

14. Control plane

The control plane runs over a Unix socket.

It covers:

root bootstrap
daemon lifecycle
channel management
chat injection
workspace inspection
identity association
runtime pause/resume
MCP management
model/provider preflight tests

maestrobot chat --verbose tails daemon-originated trace logs rather than inventing client-side logging.

15. Failure model

Important failure behaviors:

daemon start performs model/provider preflight first
runtime provider failures are logged verbosely by the daemon
resident execution contexts are snapshotted before unhealthy shutdown
tool server failures remain isolated from the rest of the runtime
paging snapshots and workspaces survive daemon restart

16. Current implementation notes

The first version is real and usable, but still intentionally local and host-centric.

Important present constraints:

Linux only
systemd --user assumed
one daemon binary and one local root
browser tooling depends on Playwright Chromium
image OCR depends on local tesseract

The core design, though, is now stable:

user-editable Maestro programs in the runtime root
durable host-managed channel runtime
supervised Myria
unified built-in and MCP tool plane

Table of Contents

1. Core model

2. Main responsibilities

3. Architectural split

3.1 Frontend

3.2 Runtime

3.3 Memory

4. Deployment shape

5. Storage layout

5.1 config.yaml

5.2 runtime.yaml

5.3 state.json

5.4 maestro/

5.5 workspaces/

5.6 users/

5.7 bin/

6. Maestro program model

7. Channel runtime model

8. Scheduling and residency

9. Prompt and action model

10. Tool model

10.1 Built-in runtime tools

10.2 Myria tools

10.3 External MCP tools

11. Workspace and VFS model

12. Memory integration

13. MCP integration

14. Control plane

15. Failure model

16. Current implementation notes

5.1 `config.yaml`

5.2 `runtime.yaml`

5.3 `state.json`

5.4 `maestro/`

5.5 `workspaces/`

5.6 `users/`

5.7 `bin/`