Link to implementation repo.

MaestroBot is a local Linux user-scoped agent service built from three real subsystems:

  • Maestro as the programmable agent-loop and state-machine substrate
  • Myria as Global Persistent Memory and the durable event substrate
  • Go as the host runtime, scheduler, control plane, and tool host

The core design goal is to run a long-lived local agent that behaves like one durable channel-based worker rather than a stateless chat wrapper.


Table of Contents


1. Core model

MaestroBot is channel-centric.

The primary runtime unit is the channel. Each channel owns:

  • one logical runtime context
  • one persistent workspace
  • one pending message queue
  • one scheduler priority
  • one paging identity
  • one retained subagent set

The system is meant to preserve work and context across idle periods, restarts, and paging boundaries.

An idle channel is not runnable merely because it has a persisted paging snapshot. The host only reactivates a channel when there is real work: pending inbound input, a due wake condition, or an explicit operator wake.


2. Main responsibilities

MaestroBot is responsible for:

  • normalizing local frontend input into channel messages
  • scheduling channel work
  • retaining and paging channel state
  • exposing and dispatching tools
  • supervising Myria
  • exposing a local control plane over Unix IPC
  • managing external MCP servers

MaestroBot is not responsible for:

  • owning long-term semantic memory itself
  • storing provider state outside its own root
  • acting as a remote multi-user network service

3. Architectural split

There are three logical layers.

3.1 Frontend

Responsibilities:

  • normalize platform input to internal messages
  • deliver outbound runtime messages to the platform
  • emit canonical events to Myria through the host runtime

Built-in frontends currently include:

  • local CLI
  • Telegram Bot API

Outbound frontend design is intentionally layered:

  1. the runtime owns canonical authored Markdown
  2. the frontend lowers that Markdown into a platform-safe intermediate form
  3. the frontend renders and delivers the final platform payload

For Telegram, the lowering path is:

  • Markdown input
  • Markdown AST
  • normalized Telegram-safe IR
  • Telegram renderer

The Telegram renderer prefers text + entities, falls back to HTML parse mode, and finally falls back to plain text chunking.

During an active Telegram-backed channel run, the frontend also emits sendChatAction(action="typing") until the run sends a reply or finalizes.

3.2 Runtime

Responsibilities:

  • own per-channel queues
  • own channel scheduling
  • own paging and residency
  • own tool dispatch
  • own provider calls
  • own subagent lifecycle
  • act as the kernel for schema execution

3.3 Memory

Responsibilities:

  • per-channel local transcript and working context
  • global durable event log
  • cross-channel and long-horizon retrieval
  • snapshot/index build workflows

Each channel is a complete local agent instance. It owns its conversation transcript, structured working-memory notebook, current-run history, workspace, queues, and sleep/wake state. The host derives the recent local context from the channel transcript before prompting.

Myria is supervised as Global Persistent Memory. It receives canonical events from the host and provides auxiliary retrieval when the local channel transcript and notebook are insufficient, stale, or too compressed.


4. Deployment shape

MaestroBot is one binary.

Normal invocation modes:

  • maestrobot --daemon Runs the daemon directly in the foreground.
  • maestrobot --daemon --debug Runs the daemon with frontend-visible gateway telemetry.
  • maestrobot ... Runs the local control CLI.
  • maestrobot daemon start|stop|status|logs Manages a systemd --user service.

Myria is launched by the daemon as a subprocess.

HTTP is not the main control-plane surface. The control path is Unix socket IPC.

Gateway debug mode is operator telemetry. It may send compact tool-call, sanitized-argument, tool-result, prompt-compaction, cumulative-token, and sleep/wake messages through frontends, but those messages are not channel transcript entries and are not appended to Myria as conversation events.


5. Storage layout

Default root:

~/.maestrobot

The root contains at least:

  • config.yaml
  • runtime.yaml
  • state.json
  • SOUL.md
  • maestro/
  • myria/
  • workspaces/
  • users/
  • paging/
  • transcripts/
  • logs/

5.1 config.yaml

Static operator-authored config.

Contains:

  • provider definitions
  • concrete model presets
  • runtime preset references
  • Myria configuration references
  • frontend configuration
  • external MCP server configuration
  • onboarding notice and archive policy

5.2 runtime.yaml

Mutable desired runtime config.

Contains:

  • stable internal users
  • internal user display names

This file is editable by the user and by the control CLI.

5.3 state.json

Mutable host-owned state.

Contains:

  • channels
  • unknown identities
  • external account to internal user mappings
  • retained subagents
  • tool server state
  • next-id counters

Unknown accounts are not allowed to enter the agent loop. They receive a deterministic onboarding notice from the runtime/frontend gate up to a configured cap, then are ignored until attached to an internal user.

5.4 maestro/

User-editable Maestro agent source and compiled artifact.

This is part of the runtime root on purpose. Users are expected to customize the agent behavior here without patching the repo.

5.5 workspaces/

workspaces/<channel-id>/

Each channel gets one persistent host directory.

5.6 users/

users/<user-id>/profile.md

Holds concise durable user-profile context. Profiles are bounded by config, injected into the prompt for known channel participants, and updated through a host-validated tool. They should contain stable preferences, constraints, communication style, and durable facts only.

Prompt rendering keeps stable identity, operating rules, mode instructions, and durable user profiles ahead of volatile channel state. The OpenRouter adapter sends that stable prefix as a separate system message and places timestamp, current message, run events, and tool observations in the dynamic user message. This preserves Maestro-owned prompt text while giving provider-side prompt caches a stable prefix.

5.7 bin/

Installer-managed runtime binaries.

This includes at least:

  • bin/maestroc
  • bin/myria

6. Maestro program model

MaestroBot uses Maestro as the programmable agent-loop substrate.

The runtime root contains:

  • maestro/channel_loop.mstr
  • maestro/subagent_loop.mstr
  • maestro/agent_loop.mstro

The repo ships default templates for the .mstr files, but the runtime does not execute those repo copies directly.

Instead:

  1. maestrobot init seeds root/maestro/
  2. the installer places root/bin/maestroc and root/bin/myria
  3. the daemon compiles root/maestro/*.mstr into a valid artifact
  4. channel and subagent execution run from that compiled root artifact

This makes the Maestro layer an operator-facing customization surface.

The Go runtime still owns:

  • persistence
  • IPC
  • provider access
  • tool execution
  • Myria supervision

But the full agent loop, prompt composition, and explicit state sequencing live in the Maestro program.

The Maestro program also declares the host-provided surface it wants for each state. A state requests:

  • valid state transitions
  • tool policy, such as @plan, @workspace, @myria, or an exact tool name
  • context sections, such as agent_memory, recent_channel_context, or last_tool_observation

The Go host treats those declarations as schema-owned policy. It expands tool groups against the currently available runtime catalog, rejects unknown required tools, skips explicitly optional tools written as ?tool.name, and injects only the requested context sections. The host still owns provider calls, validation, tool execution, persistence, and frontends.


7. Channel runtime model

Per-channel state is one of:

  • current
  • active
  • idle

Semantics:

  • current the foreground channel
  • active runnable or resumable work exists
  • idle sleeping, waiting for new work or a wake condition

Wake conditions are explicit runtime events. The host distinguishes:

  • user-message a new human inbound message
  • automatic-wake a scheduled wake created by prior channel finalization
  • runtime-wake an explicit host/operator wake without new human input

The active snapshot retains wake metadata, including the wake source, scheduled wake self-note, next wake time, and the consecutive automatic wake count. A scheduled wake self-note is a private note from the current channel agent to its future run. It should explain what the future agent should remember, inspect, decide, or avoid. Finalization always schedules a future wake. If the only plan is to check whether the user replied, the agent should schedule a long wake and tell its future self not to message the user if nothing changed.

One channel run is one contiguous execution segment from the last finalized boundary to the next finalized boundary.

That same boundary is used by daemon trace streaming.


8. Scheduling and residency

The scheduler is channel-based, not message-based.

Properties:

  • one pending queue per channel
  • channel-level priority
  • priority aging over time
  • bounded worker concurrency across channels
  • one run at a time per channel

state.json is the authoritative scheduler record. It owns pending queues, channel state, pause state, priority, wake_at, and timeout_at. Paging snapshots are execution-context records used to resume or reconstruct a run; they are not the source of truth for queue ownership or wake scheduling.

Residency is host-owned:

  • channel snapshots are host data, not opaque Maestro VM dumps
  • paging happens only at safe boundaries
  • LRU-style eviction is used when resident contexts exceed the limit
  • one Maestro run executes until it reaches idle, then the host may page or reschedule the channel later

On startup, the host reconciles these stores before starting workers. Stale current channels left by a previous process are downgraded to active if their snapshot has unfinished work, or to idle with a scheduled fallback wake if no work remains.


9. Prompt and action model

The agent prompt is not platform-specific.

The runtime and Maestro state machine may ask the agent to emit a user-visible Markdown message, but they do not ask it to emit Telegram-specific entities or HTML.

That platform lowering happens only at the frontend boundary.

Prompt construction is schema-owned, but identity is not.

SOUL.md is the exclusive source for the agent’s identity, name, voice, and standing preferences. The active Maestro program decides where that identity text is injected, but the program itself should remain identity-neutral. Runtime context should provide facts and constraints, not persona.

Each prompt includes:

  • identity text loaded from SOUL.md
  • operating and current-mode prompt text embedded directly in the active Maestro state
  • channel metadata
  • participant context
  • current message
  • bounded local working memory
  • valid next transitions
  • recent current-run events
  • loop warnings derived from repeated action patterns
  • relevant queue summaries, with follow-up content hidden until consumed

The host runtime supplies those structured sections as data, but the prompt layout and wording are embedded in maestro/channel_loop.mstr and maestro/subagent_loop.mstr.

The local working-memory layer is intentionally modeled as a structured operational notebook rather than a truthful event log. It is designed to hold the agent’s current understanding in compact sections such as user profile, channel facts, active goal, plan, open loops, workspace state, and handoff notes.

The host also retains a shorter-lived current-run-capability view inside the current channel snapshot. That view is inferred from the tools exposed during the current run and exists so the schema can answer capability questions from the broader current run rather than only the current state’s narrowed tool mask.

Each channel is modeled as a complete local agent. The local transcript is the canonical recent conversation source, and the snapshot carries a bounded current-run event stream: user messages, assistant messages, tool calls, tool results, state transitions, wake events, and context updates. The prompt is rendered from the transcript-derived recent context plus the current-run stream, not from a single last-action slot.

Myria remains Global Persistent Memory and the truthful durable event substrate. The agent should prefer the local notebook and recent local transcript context first, and consult Myria only for older, cross-channel, uncertain, stale, or too-compressed recall.

Grounding rules sit above that memory split:

  • successful tool results are factual observations
  • failed tool results are also factual observations about what did not work
  • user-visible replies must not invent successful filesystem, shell, browser, web, or memory results that are not supported by successful tool execution

Each agent step is bounded to exactly one tool/action selection.

Maestro owns the channel run loop itself:

  • prepare activates or resumes one channel run
  • plan performs bounded work on the current channel
  • finalize-send emits the user-visible reply
  • idle asks the model to choose timeout, sleep, wake self-note, and priority bookkeeping, then signals the host that the run has reached a safe waiting boundary

The host validates and dispatches the selected tool, but the loop shape and state transitions are encoded in the schema.

The schema controls which tools are visible in each step through a small policy language. Current built-in groups are:

  • @plan
  • @finalize-send
  • @idle
  • @core
  • @workspace
  • @web
  • @image
  • @browser
  • @subagents
  • @myria
  • @external

Exact tool names may be requested directly. Prefix a tool name with ? when the schema can use it if available but should continue if that tool is absent in the current runtime state.


10. Tool model

The tool plane is unified.

10.1 Built-in runtime tools

internal.* tools include:

  • queue control
  • channel finalization
  • VFS reads/writes/search
  • user-note writes
  • shell execution and interactive process sessions
  • browser automation
  • web fetch/search
  • image metadata and OCR
  • subagent control

10.2 Myria tools

myria.* is query-only from the agent’s perspective.

The agent does not append to Myria directly.

Myria tools expose global persistent memory. They are not the normal path for remembering the immediately preceding same-channel exchange.

10.3 External MCP tools

External MCP servers are registered into the same tool plane and exposed alongside built-ins.


11. Workspace and VFS model

Each channel has a host directory, but tools see a mounted VFS view.

Mount layout:

  • / writable channel workspace
  • /.host-path/<n> read-only mounted host PATH directories

The host runtime owns lease enforcement so overlapping agent activity cannot corrupt the workspace.

Long-running process sessions and browser sessions retain their lease ownership while active.


12. Memory integration

Myria is supervised as a subprocess.

Current default:

  • file-backed SQLite for convenience in local deployment and testing

Fuller backend:

  • PostgreSQL remains the richer Myria storage mode

The generated Myria config is derived from MaestroBot config.yaml. In the current MaestroBot version, SQLite is the generated convenience path.


13. MCP integration

MaestroBot supports MCP in three ways:

  • built-in Myria over stdio
  • external local MCP servers over stdio
  • external remote MCP servers over:
    • Streamable HTTP
    • legacy HTTP+SSE

The control plane supports:

  • explicit stdio registration
  • explicit remote registration
  • manifest discovery
  • manifest import
  • enable/disable
  • inspect
  • removal

Discovery currently recognizes:

  • mcp.json
  • .mcp.json
  • mcp-server.json
  • claude_desktop_config.json

Compatibility probing is real: the daemon connects to the server, performs initialize, sends notifications/initialized, and probes tools/list.


14. Control plane

The control plane runs over a Unix socket.

It covers:

  • root bootstrap
  • daemon lifecycle
  • channel management
  • chat injection
  • workspace inspection
  • identity association
  • runtime pause/resume
  • MCP management
  • model/provider preflight tests

maestrobot chat --verbose tails daemon-originated trace logs rather than inventing client-side logging.


15. Failure model

Important failure behaviors:

  • daemon start performs model/provider preflight first
  • runtime provider failures are logged verbosely by the daemon
  • resident execution contexts are snapshotted before unhealthy shutdown
  • tool server failures remain isolated from the rest of the runtime
  • paging snapshots and workspaces survive daemon restart

16. Current implementation notes

The first version is real and usable, but still intentionally local and host-centric.

Important present constraints:

  • Linux only
  • systemd --user assumed
  • one daemon binary and one local root
  • browser tooling depends on Playwright Chromium
  • image OCR depends on local tesseract

The core design, though, is now stable:

  • user-editable Maestro programs in the runtime root
  • durable host-managed channel runtime
  • supervised Myria
  • unified built-in and MCP tool plane