lighthouse Deployment Design | Peisong's Lighthouse

This document describes the implementation-facing design of the Lighthouse deployment pipeline.

Goals
Pipeline Stages
Runtime Layout
Source Discovery
Materialization Model
Deployment Model
Remote Artifact Deployment
Harbor Platform
State and Retention

Goals

The pipeline is designed to:

run unattended from cron under the lighthouse user
remain directly callable by hand through the same CLI
treat Git repositories as the source of truth for content
emit deterministic Jekyll-ready content with explicit permalinks
deploy through atomic release-directory symlink swaps
manage the server-local Harbor app platform through the same CLI

Pipeline Stages

The core apply path is:

load and validate YAML configuration
acquire the runtime lock
clone or update the site repository and source repositories
discover .md and .tex sources while honoring .lighthouseignore
validate URL mapping and local references
materialize _automated_posts/, generated directory pages, copied raw assets, generated RSS policy data, and resume artifacts
optionally generate incremental per-post ratings
run the configured Jekyll build command
copy the built site into releases/<timestamp>/
atomically repoint active and current
optionally purge Cloudflare cache
write a full state snapshot
clean old releases and matching state files

The remote artifact path is split into:

remote build on the local role
remote send on the local role
remote apply on the local role, which SSHes into the remote role
remote apply on the remote role, which deploys the staged tarball

The Harbor path is now integrated into the root deploy surface:

validate --target harbor, which validates Harbor config and database readiness
build --target harbor, which compiles the Harbor Go runtime and docked apps
apply --target harbor, which stages a Harbor release, repoints current, and writes or restarts the Harbor user service when the Harbor target changed
harbor local, which still builds Harbor and runs it in the foreground for e2e testing

Harbor change detection includes:

Harbor config
Harbor source repo state
docked app config
docked app source repo state, including clone-backed app checkouts

That keeps app-only Harbor repo updates from being skipped by root apply.

Runtime Layout

The runtime tree under ~/.lighthouse/ is split into:

config/ for human-maintained YAML
cache/repos/ for checked-out repositories
remote-cache/ for local or remote artifact staging
states/ for deployment snapshots
logs/ for verbose run logs
lock for overlap prevention

This keeps build state and logs out of the site repository itself.

Harbor uses a parallel runtime layout rooted under the CLI run dir and its configured deploy root:

<runtime>/harbor/config/
<runtime>/harbor/state/
<runtime>/harbor/logs/
<runtime>/harbor/run/
<runtime>/harbor/build/
<deploy-root>/releases/<timestamp>/
<deploy-root>/current

Harbor mutable platform state no longer lives in a steady-state JSON file. In production, Harbor owns a Postgres schema for:

users
direct permissions
preferences
maintenance state

Source Discovery

Each configured source repository:

is cloned from CLONE_URL
is updated from the configured BRANCH
is scanned recursively from repo root for supported document sources (*.md, *.tex)
honors .lighthouseignore with .gitignore-style inheritance

URL generation rules:

lowercase all source paths
remove the source suffix
preserve the remaining relative path
apply the repo-level URL_PATH
keep repo-root README.md and README.tex as ordinary posts under /readme/ so the repo root URL can serve the navigator directory view

Materialization Model

For every discovered source document, the pipeline:

resolves the canonical title from front matter, headings, LaTeX title commands, or filename
reads last_modified_at from Git
assigns a stable publish_date from the first deployment state that saw the file
rewrites local document references to generated site permalinks
generates navigator pages for source roots and repository directories under generated-directories/
copies referenced local non-Markdown assets into assets/generated/raw/
writes generated RSS policy under _data/generated/rss.yml
writes navigator source metadata under _data/generated/navigator.yml
includes each generated post rating total in directory navigator metadata so the site can sort generated posts by rating
renders .tex sources into embedded HTML fragments through make4ht
writes a Jekyll source file into _automated_posts/
copies assets/generated/raw/ into the Jekyll build output after the build command finishes; sites should exclude that source subtree from Jekyll processing so raw files with front matter stay inert

The generated source includes explicit front matter for permalink, author, source metadata, publish date, and last-modified time.

If resume.tex exists in the cached site repo itself, the CLI also generates:

assets/generated/resume/resume.fragment.html
assets/generated/resume/resume.pdf

The site-owned /aboutme/my-resume/ page embeds that generated HTML fragment and offers the PDF as a direct download.

Deployment Model

Deployment root:

/var/www/lighthouse/
  releases/
    <timestamp>/
  active -> releases/<timestamp>
  current -> releases/<timestamp>

The freshly built site is copied into a new release directory. After basic release validation, active and current are atomically repointed to that release. If Cloudflare purging is enabled, the CLI then sends a purge_everything request for the configured zone.

Remote Artifact Deployment

Remote deployment intentionally splits build-time and deploy-time responsibilities:

local role:
- sync repositories
- materialize content
- generate ratings
- generate resume artifacts
- run the configured build command
- package the final built site into a tarball
- bootstrap the remote lighthouse-cli checkout from the configured repo URL and repo path
- sync a derived remote-role config into the remote run dir
- sync portable content metadata (content-state.json) into the remote run dir
remote role:
- accept uploaded tarballs into remote-cache/incoming/
- unpack tarballs into remote-cache/unpacked/
- validate the extracted site tree
- activate a new release under releases/<timestamp>/
- rotate old releases and state snapshots
- optionally purge Cloudflare

The remote role never rebuilds from source in this mode. It only deploys staged artifacts.

Portable content metadata and remote deployment history are separated:

content-state.json carries publish dates, ratings, feed status, and related per-post metadata
states/state-*.json on the remote host track release activation and retention history

If local and remote content-state.json differ, the local role pauses for a manual resolve step. The operator must type local or remote to choose which side becomes authoritative before deployment continues.

Harbor Platform

Harbor is intentionally separate from the Jekyll site pipeline.

Current v1 design:

server-local only
Go runtime
simple landing page listing docked apps
CLI text/table operator surface
embedded local OIDC flow for e2e guest vs authenticated testing
centralized role and permission resolution
Postgres-backed Harbor users, permissions, preferences, and maintenance state
guest/read-only plus owner/admin write validation through the production-facing schedule app

Harbor-managed apps expose:

a public localhost HTTP bind
a management HTTP API on a Unix socket

The Harbor runtime:

starts configured app processes
checks readiness through the management socket
reverse proxies /<slug>/...
injects platform auth context headers
supports global maintenance mode
sends drain/resume signals to apps through the management socket

Harbor users are seeded once from HARBOR.OIDC.EMBEDDED_USERS only during database bootstrap. After lighthouse harbor migrate-db, Harbor user and platform state is mutable Postgres-backed state:

canonical usernames, display usernames, subjects, password hashes, roles, and direct permissions live in Harbor’s Postgres schema
lighthouse harbor add-user, set-password, add-permissions, delete-permissions, delete-user, list-permissions, maintenance on, and maintenance off operate on that database
the Harbor runtime reloads DB-backed user and platform state while serving requests so permission and preference changes take effect without a restart

The admin permission is treated as a wildcard and satisfies all app permission checks.

The Harbor build/apply path renders a standalone Harbor runtime config inside each release. Docked apps must support config-file loading; the Harbor config points to the app config file to pass at startup.

App metadata such as the Harbor landing-page description also lives in Harbor config, not in the runtime binary.

State and Retention

Every successful apply writes a full state snapshot:

~/.lighthouse/states/state-<timestamp>.json

That snapshot carries forward first-seen publication dates and source commit metadata so older state files can be retired without losing publication history for live posts.

Remote artifact deployment also writes deploy-oriented state snapshots on the remote host. These snapshots record the applied artifact id, unpacked cache path, and activated release directory, but do not try to mirror the local materialization state one-to-one.

Retention cleanup runs only after a successful deploy. Old releases and their matching state snapshots are deleted together. Logs are retained indefinitely.

Table of Contents