This document describes the YAML configuration schema consumed by the lighthouse CLI, including both the static-site pipeline and the Harbor platform runtime.

Table of Contents

Config Loading

The CLI loads configuration from either:

  • a single YAML file
  • a config directory with command-specific files

Default config directory:

~/.lighthouse/config/

Use --config to point to a different file or directory.

The preferred layout is a config directory with separate files:

  • lighthouse.yml or lighthouse.yaml
  • harbor.yml or harbor.yaml

Those files should use the namespaced roots:

  • LIGHTHOUSE:
  • HARBOR:

Directory-loading behavior is command-specific:

  • static Lighthouse commands only scan lighthouse.yml|yaml
  • Harbor commands only scan harbor.yml|yaml

GLOBAL is reserved for future shared settings, but the current directory loaders do not scan global.yml.

For backward compatibility, explicit flat Lighthouse-only files still load when you point --config at that file directly.

Top-Level Keys

Supported root namespaces:

  • GLOBAL Reserved for shared operator settings.
  • LIGHTHOUSE Static-site build, deploy, rating, RSS, and source materialization config.
  • HARBOR Harbor build, runtime, auth, and app-hosting config.

GLOBAL

GLOBAL is intentionally sparse in v1. It exists so the config surface can stay unified while Lighthouse and Harbor keep separate schemas.

LIGHTHOUSE

LIGHTHOUSE uses the existing static-site schema. Required unless marked optional:

  • LIGHTHOUSE_CLONE_URL
  • LIGHTHOUSE_DIRECTORY
  • LIGHTHOUSE_BRANCH
  • BUILD_COMMAND
  • BUILD_OUTPUT_DIR
  • DEPLOY_ROOT
  • RETAIN_DEPLOYMENTS
  • EMAIL
  • CLOUDFLARE
  • REMOTE
  • RSS
  • RATINGS
  • SOURCES

The sections below still refer to the keys inside LIGHTHOUSE:.

HARBOR

Supported keys:

  • SOURCE_DIR
  • CLONE_URL
  • BRANCH
  • DEPLOY_ROOT
  • RETAIN_DEPLOYMENTS
  • RUNTIME_DIR
  • DATABASE_URL
  • BIND_HOST
  • BIND_PORT
  • PUBLIC_BASE_URL
  • OIDC
  • PERMISSIONS
  • MAINTENANCE
  • WORKSPACE
  • APPS

Key Harbor behaviors:

  • Harbor is server-local in v1
  • Harbor itself is sourced from exactly one mode:
    • SOURCE_DIR
    • or CLONE_URL + BRANCH
  • Harbor stores mutable platform state in Postgres via DATABASE_URL
  • docked apps are sourced from exactly one mode:
    • SOURCE_DIR
    • or CLONE_URL + BRANCH
  • docked apps are built and staged into Harbor releases
  • docked apps must load runtime settings from a config file
  • Harbor injects resolved identity, roles, and permissions when it proxies requests
  • Harbor owns the built-in /myself/ route; app slug myself is reserved and may not be used by docked apps

DATABASE_URL should point Harbor at a Postgres schema it owns for:

  • users
  • direct permissions
  • user preferences
  • maintenance state
  • Harbor API keys

When Harbor itself uses CLONE_URL + BRANCH, the CLI clones or updates that checkout inside the Harbor repo cache before building the Harbor runtime.

OIDC supported keys:

  • ENABLED
  • MODE
  • ISSUER_PATH
  • CLIENT_ID
  • CLIENT_SECRET
  • SESSION_COOKIE_NAME
  • SESSION_SECRET
  • EMBEDDED_USERS

Only MODE: embedded is supported in v1.

Each EMBEDDED_USERS entry supports:

  • USERNAME
  • DISPLAY_USERNAME
  • PASSWORD
  • SUBJECT
  • NAME
  • EMAIL
  • ROLES

Harbor normalizes canonical usernames to lowercase and only allows:

  • a-z
  • 0-9
  • _
  • -

The normalized canonical username is used for identity and route-safe slugs. DISPLAY_USERNAME is the human-facing label shown in Harbor and apps. If DISPLAY_USERNAME is omitted, the raw configured USERNAME value is used for display before normalization.

EMBEDDED_USERS are bootstrap-only seed users, not the long-term mutable user store. Run lighthouse harbor migrate-db to create the Harbor schema and import either legacy state or embedded users into Postgres. After the Harbor database is initialized, Harbor ignores EMBEDDED_USERS for steady-state auth and reads mutable users, permissions, preferences, and maintenance state from Postgres. Once that database state exists, EMBEDDED_USERS may be removed from config.

Harbor DB-backed admin commands:

  • lighthouse harbor add-user
  • lighthouse harbor set-password
  • lighthouse harbor list-permissions
  • lighthouse harbor list-all-permissions
  • lighthouse harbor add-permissions
  • lighthouse harbor delete-permissions
  • lighthouse harbor delete-user
  • lighthouse harbor migrate-db
  • lighthouse harbor maintenance on
  • lighthouse harbor maintenance off

Permission notes:

  • effective permissions are the union of role-derived permissions and direct user permissions
  • the direct admin permission acts as a wildcard and grants all app permissions
  • Harbor resolves principals in this order:
    • API key bearer token
    • Harbor browser session cookie
    • guest
  • docked apps receive the derived Harbor identity via forwarded X-Harbor-* headers, including auth method, theme, timezone, and API key ID when applicable

MAINTENANCE remains in config as default bootstrap values only. Harbor writes the live maintenance state into Postgres during database initialization and then mutates the database-backed value through the Harbor CLI.

WORKSPACE supported keys:

  • ROOT_DIR
  • MAX_LEASES
  • DEFAULT_SIZE_MB

APPS is a list. Each app supports:

  • ID
  • NAME
  • DESCRIPTION
  • SLUG
  • SOURCE_DIR
  • CLONE_URL
  • BRANCH
  • BUILD_COMMAND
  • RUN_COMMAND
  • CONFIG_PATH
  • PUBLIC_BIND_ADDR
  • MANAGEMENT_SOCKET_PATH
  • GUEST_CAN_VIEW
  • READ_PERMISSIONS
  • WRITE_PERMISSIONS

Each app must specify exactly one source mode:

  • SOURCE_DIR
  • or CLONE_URL together with BRANCH

DESCRIPTION is optional. Harbor uses it on the landing page cards.

Whiteboard example:

  • SLUG: whiteboard
  • GUEST_CAN_VIEW: true
  • READ_PERMISSIONS: []
  • WRITE_PERMISSIONS: []
  • Whiteboard relies on Harbor principal resolution rather than app-specific Harbor permissions; private server documents and MCP still require an authenticated Harbor principal inside Whiteboard

BUILD_COMMAND and RUN_COMMAND support these placeholders:

  • {release_dir}
  • {app_dir}
  • {config_path}
  • {runtime_dir}

EMAIL

Supported keys:

  • ENABLED
  • FROM
  • USER
  • SMTP_ADDR
  • SMTP_PORT
  • PASSWD
  • DESTINATIONS
  • STARTTLS
  • IMPLICIT_TLS

If EMAIL.ENABLED is true, the SMTP fields above must be fully specified and DESTINATIONS must be non-empty.

Run-log delivery is controlled per invocation with --email-log. SMTP configuration alone does not automatically send mail on every run.

CLOUDFLARE

Supported keys:

  • ENABLED
  • ZONE_ID
  • CLOUDFLARE_API_KEY

When CLOUDFLARE.ENABLED is true, these fields become required:

  • ZONE_ID
  • CLOUDFLARE_API_KEY

Behavior:

  • runs only during apply
  • executes after the new release is activated
  • sends a Cloudflare purge_everything request for the configured zone
  • fails the command loudly if the Cloudflare API request fails or the API reports an unsuccessful purge

Example:

CLOUDFLARE:
  ENABLED: true
  ZONE_ID: "replace-me"
  CLOUDFLARE_API_KEY: "replace-me"

REMOTE

Supported keys:

  • ENABLED
  • ROLE
  • SSH_HOST
  • SSH_PORT
  • REMOTE_RUN_DIR
  • LOCAL_CACHE_DIR
  • REMOTE_CACHE_DIR
  • LIGHTHOUSE_CLI_CLONE_URL
  • LIGHTHOUSE_CLI_REPO_PATH

Roles:

  • local
  • remote

Validation rules:

  • ROLE is required when REMOTE.ENABLED is true
  • REMOTE_RUN_DIR is required when REMOTE.ENABLED is true
  • SSH_PORT must be an integer >= 1
  • SSH_HOST is required when ROLE is local
  • LIGHTHOUSE_CLI_CLONE_URL is required when ROLE is local
  • LIGHTHOUSE_CLI_REPO_PATH is required when ROLE is local

Defaults:

  • SSH_PORT: 22
  • LOCAL_CACHE_DIR: <run-dir>/remote-cache
  • REMOTE_CACHE_DIR: <REMOTE_RUN_DIR>/remote-cache

Behavior:

  • remote build is valid only for ROLE=local
  • remote send is valid only for ROLE=local
  • remote apply is valid for both roles
  • local remote apply checks the remote cache and then SSHes into the target to invoke remote remote apply
  • before triggering remote deployment, local remote apply ensures the configured remote lighthouse-cli checkout exists, pulls or clones it, installs or upgrades the package, and syncs a derived remote-role config.yml
  • local remote apply also syncs portable content metadata (content-state.json) into the remote run dir
  • if local and remote content metadata differ, the CLI pauses and asks you to type local or remote
  • remote remote apply deploys a tarball already staged in REMOTE_CACHE_DIR/incoming/
  • remote artifact deploys never rebuild from source

Example local-side config:

REMOTE:
  ENABLED: true
  ROLE: "local"
  SSH_HOST: "lighthouse@ubuntu-main"
  SSH_PORT: 22
  REMOTE_RUN_DIR: "/var/lib/lighthouse"
  LIGHTHOUSE_CLI_CLONE_URL: "https://git.peisongxiao.com/peisongxiao/lighthouse-cli.git"
  LIGHTHOUSE_CLI_REPO_PATH: "/srv/lighthouse-cli"

Example remote-side config:

REMOTE:
  ENABLED: true
  ROLE: "remote"
  REMOTE_RUN_DIR: "/var/lib/lighthouse"

RSS

Supported keys:

  • ENABLED
  • RSS_FEED_RATING_THRESHOLD
  • UPDATED_DIFF_RATING_THRESHOLD

Defaults:

  • ENABLED: true
  • RSS_FEED_RATING_THRESHOLD: 7.0
  • UPDATED_DIFF_RATING_THRESHOLD: -1.0

Behavior:

  • RSS_FEED_RATING_THRESHOLD gates RSS inclusion by combined post rating
  • UPDATED_DIFF_RATING_THRESHOLD <= 0 disables the future diff-rating LLM path and treats updated posts mechanically
  • the CLI materializes RSS policy into machine-owned site data so the Jekyll site can render the final XML feeds

Example:

RSS:
  ENABLED: true
  RSS_FEED_RATING_THRESHOLD: 7.0
  UPDATED_DIFF_RATING_THRESHOLD: -1.0

RATINGS

Supported keys:

  • ENABLED
  • PROVIDER
  • OPENROUTER_API_KEY
  • RATINGS_MODEL
  • DEFAULT_SCORE
  • MAX_RETRIES
  • MAX_THREADS
  • HTTP_TIMEOUT_SECONDS
  • REASONING_EFFORT
  • PROMPT

Current provider support is intentionally narrow:

  • PROVIDER must be openrouter

When RATINGS.ENABLED is true, these fields become required:

  • OPENROUTER_API_KEY
  • RATINGS_MODEL
  • PROMPT

DEFAULT_SCORE must stay within [0.0, 5.0].

MAX_RETRIES must be an integer >= 1.

MAX_THREADS must be an integer >= 1.

HTTP_TIMEOUT_SECONDS must be an integer >= 1.

REASONING_EFFORT must be one of:

  • xhigh
  • high
  • medium
  • low
  • minimal
  • none

Rating generation behavior:

  • runs during build, apply, and local
  • never runs during validate
  • retries provider failures and invalid structured outputs up to MAX_RETRIES
  • runs fresh rating jobs through a bounded central worker pool sized by MAX_THREADS
  • uses HTTP_TIMEOUT_SECONDS for the OpenRouter HTTP read timeout
  • sends REASONING_EFFORT through OpenRouter’s reasoning.effort field
  • explicitly disables streaming for rating requests
  • clamps out-of-bounds scores into [0.0, 5.0] instead of retrying
  • falls back to DEFAULT_SCORE after the retry budget is exhausted

The prompt lives in config, not in the repo. The CLI appends the raw source document below that prompt at runtime.

Example:

RATINGS:
  ENABLED: true
  PROVIDER: "openrouter"
  OPENROUTER_API_KEY: "replace-me"
  RATINGS_MODEL: "openai/gpt-5-mini"
  DEFAULT_SCORE: 2.5
  MAX_RETRIES: 3
  MAX_THREADS: 1
  HTTP_TIMEOUT_SECONDS: 120
  REASONING_EFFORT: "low"
  PROMPT: |
    --- BEGIN TASK DESCRIPTION ---
    Rate the provided document for standalone long-term
    showcase value on a personal website.
    --- END TASK DESCRIPTION ---

    --- BEGIN SCORE EXPLANATION ---
    Use a 0.0 to 5.0 scale where 2.5 is neutral.
    --- END SCORE EXPLANATION ---

    --- BEGIN OUTPUT FORMAT ---
    Return JSON with score, reason, and signals.
    --- END OUTPUT FORMAT ---

SOURCES

Each source entry supports:

  • NAME
  • CLONE_URL
  • BRANCH
  • URL_PATH
  • POST_TAG
  • RECENT_POSTS

URL_PATH is optional. If omitted, the default is:

/projects/<normalized-name>/

Special cases can override this explicitly, for example:

  • /blogs/
  • /maestro/

URL_PATH is the source root navigator URL. Source roots cannot be nested inside one another: /prefix-1/ and /prefix-1/subprefix/ conflict, while /prefix/subprefix-1/ and /prefix/subprefix-2/ are valid siblings if /prefix/ is not itself configured as a source root.

POST_TAG is optional. It controls the badge shown on cards for materialized posts from that source. If omitted, the CLI derives the badge from NAME by replacing - and _ with spaces and uppercasing the result.

Examples:

POST_TAG: "BLOGS"
POST_TAG: "LANGUAGE DESIGN"

RECENT_POSTS is an optional list of glob patterns evaluated relative to the source repo root. Matching discovered source documents are marked for the homepage recent-posts section. Any **/ segment also matches the current directory at that level, so:

  • **/*.md includes both README.md and nested Markdown files
  • **/*.tex includes both README.tex and nested LaTeX files
  • thoughts/**/*.md includes both thoughts/post.md and deeper files

Examples:

RECENT_POSTS:
  - "**/*.md"
  - "**/*.tex"
RECENT_POSTS:
  - "thoughts/**/*.md"
  - "announcements/*.md"

If RECENT_POSTS is omitted or empty, that source repo does not contribute surfaced posts to the homepage recent-posts strip or to the surfaced-post ordering in navigator views.

Validation Rules

Validation is intentionally strict. The CLI will fail if it sees:

  • unknown config keys
  • wrong value types
  • empty required strings
  • RETAIN_DEPLOYMENTS < 1
  • RATINGS.DEFAULT_SCORE outside [0.0, 5.0]
  • RATINGS.MAX_RETRIES < 1
  • RATINGS.MAX_THREADS < 1
  • REMOTE.SSH_PORT < 1
  • enabled REMOTE local-role blocks without SSH_HOST
  • enabled REMOTE blocks without ROLE and REMOTE_RUN_DIR
  • enabled CLOUDFLARE blocks without ZONE_ID and CLOUDFLARE_API_KEY
  • invalid URL_PATH formatting
  • nested or duplicate source root URL_PATH values
  • URL or generated-path conflicts
  • missing local document references
  • references to unmanaged .md or .tex targets
  • missing referenced local assets
  • existing runtime lock files for non-config-only operations

Error messages are designed to say:

  • what key or path failed
  • what type or shape was expected
  • what value was actually received

Command Notes

  • validate --config-only validates merged config only
  • validate validates config and the pre-build deployment inputs
  • validate --fresh validates using the same full input pipeline but ignores prior incremental rating metadata for the current run
  • build validates, syncs, materializes, rates incrementally, builds without deployment, and writes a state snapshot for future incremental runs
  • apply runs the full rating, build, state, and deployment path, then purges Cloudflare if CLOUDFLARE.ENABLED is true
  • clean supports --target lighthouse and --target harbor
  • clean --target lighthouse acquires the runtime lock, removes the static-site cache/repos directory with rm -rf, and recreates the empty cache directory
  • clean --target harbor removes the Harbor repo cache directory under the configured Harbor runtime tree and recreates it empty
  • apply --target harbor treats both Harbor itself and clone-backed docked apps as deploy inputs for change detection, so app-only repo updates trigger Harbor redeploys
  • check-deps validates that the current environment has the executables required for the configured feature set
  • local PORT rates incrementally, builds, writes state, and serves locally without deploying to /var/www/lighthouse It ignores DEPLOY_ROOT.
  • remote build runs the full local materialize-and-build path, then packages the built site under LOCAL_CACHE_DIR
  • remote send uploads a packaged tarball into REMOTE_CACHE_DIR/incoming/
  • remote apply on ROLE=local, verifies the staged remote tarball and triggers deployment over SSH
  • remote apply on ROLE=local also resolves and syncs portable content metadata before deployment
  • remote apply on ROLE=remote, unpacks the staged tarball, activates a release, writes remote deploy state, rotates old releases, and optionally purges Cloudflare