This document describes the implementation-facing design of the Lighthouse deployment pipeline.

Table of Contents

Goals

The pipeline is designed to:

  • run unattended from cron under the lighthouse user
  • remain directly callable by hand through the same CLI
  • treat Git repositories as the source of truth for content
  • emit deterministic Jekyll-ready content with explicit permalinks
  • deploy through atomic release-directory symlink swaps
  • manage the server-local Harbor app platform through the same CLI

Pipeline Stages

The core apply path is:

  1. load and validate YAML configuration
  2. acquire the runtime lock
  3. clone or update the site repository and source repositories
  4. discover .md and .tex sources while honoring .lighthouseignore
  5. validate URL mapping and local references
  6. materialize _automated_posts/, generated directory pages, copied raw assets, generated RSS policy data, and resume artifacts
  7. optionally generate incremental per-post ratings
  8. run the configured Jekyll build command
  9. copy the built site into releases/<timestamp>/
  10. atomically repoint active and current
  11. optionally purge Cloudflare cache
  12. write a full state snapshot
  13. clean old releases and matching state files

The remote artifact path is split into:

  1. remote build on the local role
  2. remote send on the local role
  3. remote apply on the local role, which SSHes into the remote role
  4. remote apply on the remote role, which deploys the staged tarball

The Harbor path is now integrated into the root deploy surface:

  1. validate --target harbor, which validates Harbor config and database readiness
  2. build --target harbor, which compiles the Harbor Go runtime and docked apps
  3. apply --target harbor, which stages a Harbor release, repoints current, and writes or restarts the Harbor user service when the Harbor target changed
  4. harbor local, which still builds Harbor and runs it in the foreground for e2e testing

Harbor change detection includes:

  • Harbor config
  • Harbor source repo state
  • docked app config
  • docked app source repo state, including clone-backed app checkouts

That keeps app-only Harbor repo updates from being skipped by root apply.

Runtime Layout

The runtime tree under ~/.lighthouse/ is split into:

  • config/ for human-maintained YAML
  • cache/repos/ for checked-out repositories
  • remote-cache/ for local or remote artifact staging
  • states/ for deployment snapshots
  • logs/ for verbose run logs
  • lock for overlap prevention

This keeps build state and logs out of the site repository itself.

Harbor uses a parallel runtime layout rooted under the CLI run dir and its configured deploy root:

  • <runtime>/harbor/config/
  • <runtime>/harbor/state/
  • <runtime>/harbor/logs/
  • <runtime>/harbor/run/
  • <runtime>/harbor/build/
  • <deploy-root>/releases/<timestamp>/
  • <deploy-root>/current

Harbor mutable platform state no longer lives in a steady-state JSON file. In production, Harbor owns a Postgres schema for:

  • users
  • direct permissions
  • preferences
  • maintenance state

Source Discovery

Each configured source repository:

  • is cloned from CLONE_URL
  • is updated from the configured BRANCH
  • is scanned recursively from repo root for supported document sources (*.md, *.tex)
  • honors .lighthouseignore with .gitignore-style inheritance

URL generation rules:

  • lowercase all source paths
  • remove the source suffix
  • preserve the remaining relative path
  • apply the repo-level URL_PATH
  • keep repo-root README.md and README.tex as ordinary posts under /readme/ so the repo root URL can serve the navigator directory view

Materialization Model

For every discovered source document, the pipeline:

  • resolves the canonical title from front matter, headings, LaTeX title commands, or filename
  • reads last_modified_at from Git
  • assigns a stable publish_date from the first deployment state that saw the file
  • rewrites local document references to generated site permalinks
  • generates navigator pages for source roots and repository directories under generated-directories/
  • copies referenced local non-Markdown assets into assets/generated/raw/
  • writes generated RSS policy under _data/generated/rss.yml
  • writes navigator source metadata under _data/generated/navigator.yml
  • includes each generated post rating total in directory navigator metadata so the site can sort generated posts by rating
  • renders .tex sources into embedded HTML fragments through make4ht
  • writes a Jekyll source file into _automated_posts/
  • copies assets/generated/raw/ into the Jekyll build output after the build command finishes; sites should exclude that source subtree from Jekyll processing so raw files with front matter stay inert

The generated source includes explicit front matter for permalink, author, source metadata, publish date, and last-modified time.

If resume.tex exists in the cached site repo itself, the CLI also generates:

  • assets/generated/resume/resume.fragment.html
  • assets/generated/resume/resume.pdf

The site-owned /aboutme/my-resume/ page embeds that generated HTML fragment and offers the PDF as a direct download.

Deployment Model

Deployment root:

/var/www/lighthouse/
  releases/
    <timestamp>/
  active -> releases/<timestamp>
  current -> releases/<timestamp>

The freshly built site is copied into a new release directory. After basic release validation, active and current are atomically repointed to that release. If Cloudflare purging is enabled, the CLI then sends a purge_everything request for the configured zone.

Remote Artifact Deployment

Remote deployment intentionally splits build-time and deploy-time responsibilities:

  • local role:
    • sync repositories
    • materialize content
    • generate ratings
    • generate resume artifacts
    • run the configured build command
    • package the final built site into a tarball
    • bootstrap the remote lighthouse-cli checkout from the configured repo URL and repo path
    • sync a derived remote-role config into the remote run dir
    • sync portable content metadata (content-state.json) into the remote run dir
  • remote role:
    • accept uploaded tarballs into remote-cache/incoming/
    • unpack tarballs into remote-cache/unpacked/
    • validate the extracted site tree
    • activate a new release under releases/<timestamp>/
    • rotate old releases and state snapshots
    • optionally purge Cloudflare

The remote role never rebuilds from source in this mode. It only deploys staged artifacts.

Portable content metadata and remote deployment history are separated:

  • content-state.json carries publish dates, ratings, feed status, and related per-post metadata
  • states/state-*.json on the remote host track release activation and retention history

If local and remote content-state.json differ, the local role pauses for a manual resolve step. The operator must type local or remote to choose which side becomes authoritative before deployment continues.

Harbor Platform

Harbor is intentionally separate from the Jekyll site pipeline.

Current v1 design:

  • server-local only
  • Go runtime
  • simple landing page listing docked apps
  • CLI text/table operator surface
  • embedded local OIDC flow for e2e guest vs authenticated testing
  • centralized role and permission resolution
  • Postgres-backed Harbor users, permissions, preferences, and maintenance state
  • guest/read-only plus owner/admin write validation through the production-facing schedule app

Harbor-managed apps expose:

  • a public localhost HTTP bind
  • a management HTTP API on a Unix socket

The Harbor runtime:

  • starts configured app processes
  • checks readiness through the management socket
  • reverse proxies /<slug>/...
  • injects platform auth context headers
  • supports global maintenance mode
  • sends drain/resume signals to apps through the management socket

Harbor users are seeded once from HARBOR.OIDC.EMBEDDED_USERS only during database bootstrap. After lighthouse harbor migrate-db, Harbor user and platform state is mutable Postgres-backed state:

  • canonical usernames, display usernames, subjects, password hashes, roles, and direct permissions live in Harbor’s Postgres schema
  • lighthouse harbor add-user, set-password, add-permissions, delete-permissions, delete-user, list-permissions, maintenance on, and maintenance off operate on that database
  • the Harbor runtime reloads DB-backed user and platform state while serving requests so permission and preference changes take effect without a restart

The admin permission is treated as a wildcard and satisfies all app permission checks.

The Harbor build/apply path renders a standalone Harbor runtime config inside each release. Docked apps must support config-file loading; the Harbor config points to the app config file to pass at startup.

App metadata such as the Harbor landing-page description also lives in Harbor config, not in the runtime binary.

State and Retention

Every successful apply writes a full state snapshot:

~/.lighthouse/states/state-<timestamp>.json

That snapshot carries forward first-seen publication dates and source commit metadata so older state files can be retired without losing publication history for live posts.

Remote artifact deployment also writes deploy-oriented state snapshots on the remote host. These snapshots record the applied artifact id, unpacked cache path, and activated release directory, but do not try to mirror the local materialization state one-to-one.

Retention cleanup runs only after a successful deploy. Old releases and their matching state snapshots are deleted together. Logs are retained indefinitely.