feat: initial AgentLab portfolio content

Architecture, overview, homelab build plan, agent handbook, ADRs, and agent operating rules. All sensitive operational details sanitized (real IPs, hostnames, client names replaced with generic placeholders). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 04:52:42 +00:00 · 2026-03-31 04:52:42 +00:00 · d5ef629a54
commit d5ef629a54
parent 061a40e927
13 changed files with 1243 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,17 @@
-# portfolio
+# Matt Hall — Portfolio

-A custom-built collaborative AI multi-agent orchestration platform and the projects built with it
+A custom-built collaborative AI multi-agent orchestration platform and the projects built with it.
+
+## Projects
+
+| Project | What it is |
+|---------|-----------|
+| [AgentLab](agentlab/) | Self-hosted multi-agent AI orchestration platform — infrastructure automation, private AI stack, and business control plane |
+
+## About
+
+I'm an MSP operator building infrastructure and AI tooling under the Therapon platform. This portfolio documents the architecture, decisions, and operating model behind the systems I build.
+
+The full system runs on a MacBook Pro M5 (operator workstation), a VPS in the cloud, and an HP Z640 home server — all connected via WireGuard VPN and managed through a Git-mediated multi-agent workflow.
+
+→ Start with the [AgentLab overview](agentlab/overview.md)
--- a/agentlab/README.md
+++ b/agentlab/README.md
@ -0,0 +1,26 @@
+# AgentLab
+
+Self-hosted multi-agent AI orchestration platform for infrastructure automation, private AI inference, and business operations.
+
+## Contents
+
+| Document | What it covers |
+|----------|---------------|
+| [overview.md](overview.md) | Plain-language overview + how to replicate the pattern |
+| [architecture.md](architecture.md) | Full system map — hardware, networking, AI stack, three-lane model |
+| [homelab-build-plan.md](homelab-build-plan.md) | HP Z640 home server phased rebuild plan |
+| [agent-handbook.md](agent-handbook.md) | Shared operating protocol for all AI agents |
+| [decisions/](decisions/) | Architectural Decision Records |
+| [rules/](rules/) | Agent operating rules — research protocol, README maintenance, proactive engineering |
+
+## Quick summary
+
+AgentLab is a controlled operating model for AI-assisted engineering:
+
+- A Docker devcontainer is the bounded workspace — agents run inside it, not on the host
+- A Git repo is the durable memory — all instructions, state, handoffs, and outputs live in it
+- Three AI agents (Claude, Codex, Gemini) each have their own branch and lane of work
+- A human promotion gate controls what reaches production
+- Private data never leaves the machine — local Ollama handles the private lane
+
+The name for the overall system direction is **Therapon** — an operator-owned control plane that doesn't depend on any single vendor or subscription stack.
--- a/agentlab/agent-handbook.md
+++ b/agentlab/agent-handbook.md
@ -0,0 +1,152 @@
+# Agent Handbook
+
+Defines the shared operating protocol for all agents working in this repo.
+
+This handbook is the common layer beneath agent-specific overlays (CLAUDE.md, GEMINI.md, Codex baseline). It governs startup behavior, source-of-truth discipline, read/write boundaries, environment assumptions, first-run onboarding, cross-agent compatibility, and closeout behavior.
+
+---
+
+## Authority Order
+
+When instructions differ, use this order:
+
+1. `AGENTS.md`
+2. `LLM-ENTRYPOINT.md`
+3. Authoritative state docs: `docs/priorities.md`, `docs/open-loops.md`, `vps/CURRENT-STATE.md`
+4. This handbook
+5. Agent-specific overlays
+
+Agent-specific overlays must not contradict higher-authority documents.
+
+---
+
+## Mandatory Startup
+
+Before doing any task work, every agent must read the mandatory startup chain defined in `AGENTS.md` and `LLM-ENTRYPOINT.md`.
+
+If any required file is missing or unreadable, stop and report it. Do not begin planning, editing, or execution on assumptions.
+
+When a task depends on live state that may have changed during the session, agents must re-read the relevant state files rather than rely on stale context.
+
+---
+
+## Source Of Truth Rules
+
+Agents must:
+
+- use repo files as the primary source of truth
+- distinguish verified facts from inference
+- use `Known / Reasoned / Action` when the task is analytical or state-sensitive
+- avoid memory-only answers when current repo state is relevant
+- never claim a repo-local instruction or config file exists unless verified
+
+---
+
+## Read / Write Rules
+
+Agents may read repo files and run allowed read-only commands without asking.
+
+Agents must state intent and wait for explicit operator approval before:
+
+- editing any file
+- creating any file
+- running commands that the repo marks as approval-required
+- taking any action that could affect shared agent systems
+
+No opportunistic cleanup. No "while I'm here" side edits. No silent growth of agent-local docs or config in arbitrary locations.
+
+---
+
+## Environment Requirements
+
+### Docker / Devcontainer
+
+Agents must know:
+
+- the repo operates inside a devcontainer
+- Docker Compose lifecycle actions require operator approval
+- named volumes persist agent config across rebuilds
+- bind mounts and container-local filesystem are not equivalent
+
+Agents must not assume host files are writable or visible from the container unless the repo explicitly documents that.
+
+### Git
+
+Agents must know:
+
+- git is part of the system's durable memory substrate
+- `git status` is normal session hygiene
+- commit, push, and branch operations require approval where repo rules say so
+
+Agents must not invent their own branch model.
+
+### Documentation
+
+Agents must know:
+
+- authoritative state docs are part of runtime operations, not optional reading
+- new durable docs must follow the documentation structure standard
+- legacy docs are not silently normalized during unrelated work
+- every repo must have a human-readable `README.md`
+
+---
+
+## First-Run Onboarding For New Agents
+
+When a new agent or model is installed or rebooted into the repo for the first time, it must:
+
+1. Read the mandatory startup chain
+2. Read this handbook
+3. Read the existing agent-specific overlays for the other agents
+4. Inspect its own native structure, config, memory, hook, and instruction mechanisms
+5. Compare those mechanisms against the repo's shared rules
+6. Produce a non-destructive compatibility report
+7. Ask for approval before creating or editing any integration file
+
+---
+
+## Agent Self-Integration Rule
+
+A new agent may propose its own overlay file, config reference doc, and hook/config skeleton, and shared-doc updates needed for compatibility.
+
+A new agent may not create or edit those files without explicit approval.
+
+Agent-local integration must be additive, minimal, and compatible with the rest of the system.
+
+---
+
+## Cross-Agent Compatibility
+
+Agent-specific files must not:
+
+- weaken shared safety boundaries
+- redefine write permissions
+- invent conflicting closeout behavior
+- silently override shared startup requirements
+- create undocumented config or docs in arbitrary locations
+
+Shared rules win over local preferences.
+
+---
+
+## Current Agent Inventory
+
+| Agent / Runtime | Status | Notes |
+|---|---|---|
+| Claude | Active | Repo-specific hook system present |
+| Codex | Active | Config baseline, approval/sandbox settings, and instruction-size limits matter |
+| Gemini | Active | Overlay active |
+| Local private model | Planned | Host-local inference path not yet finalized |
+
+---
+
+## Verification Standard
+
+An agent is not considered correctly onboarded until it demonstrates that it can:
+
+- follow the startup chain
+- identify the current priority from repo docs
+- respect write approval boundaries
+- identify its own local overlay/config files accurately
+- follow the shared closeout workflow
+- avoid inventing missing files or unsupported features
--- a/agentlab/architecture.md
+++ b/agentlab/architecture.md
@ -0,0 +1,360 @@
+# AgentLab System Architecture
+
+> **Scope:** Full system map — hardware, networking, AI stack, three-lane operating model,
+> and open architectural decisions.
+
+---
+
+## Contents
+
+- [Hardware Inventory](#hardware-inventory)
+- [Three-Lane Operating Model](#three-lane-operating-model)
+- [Network Topology](#network-topology)
+- [VLAN Scheme](#vlan-scheme)
+- [VPS Platform Services](#vps-platform-services)
+- [AI Stack](#ai-stack)
+- [AgentLab Orchestrator](#agentlab-orchestrator)
+- [Chat and Voice Access](#chat-and-voice-access)
+- [Backup and Recovery](#backup-and-recovery)
+- [Open Architectural Decisions](#open-architectural-decisions)
+
+---
+
+## Hardware Inventory
+
+| Node | Type | Status | Role |
+|------|------|--------|------|
+| MacBook Pro M5 24 GB | Operator workstation | Active | Primary operator machine, Claude Code devcontainer, local Ollama (privacy), portable to client sites |
+| HP Z640 64 GB RAM | Home server | Needs rebuild | Proxmox VE host — agent stack, media server, persistent Ollama inference (RTX 3060) |
+| Intel NUC (home office) | Small server | Active | Proxmox Backup Server, VPS recovery testing lab |
+| CHR01 | MikroTik Cloud Hosted Router | Active | WireGuard hub (all home lab spokes), SSTP hub for client MikroTik routers |
+| CHR02 | MikroTik Cloud Hosted Router | Planned | Failover hub — design undefined |
+| VPS01 | VPS | Active | Forgejo, Caddy reverse proxy, monitoring stack, WireGuard spoke |
+| VPS02 | VPS | Partial | Warm standby for VPS01 — manual failover only, sync method undefined |
+| Client routers | MikroTik RouterOS v6 | Active | SSTP tunnels to CHR01 ([MSP Client A] — multiple sites) |
+| Android tablet | Mobile | Phase Later | Dashboard, demo device |
+
+### MacBook Pro M5 — Workstation Baseline
+
+| Item | Value |
+|------|-------|
+| Chip | Apple M5 |
+| CPU cores | 10 total — 4 performance, 6 efficiency |
+| Unified memory | 24 GB |
+
+### HP Z640 — GPU Inventory
+
+| GPU | VRAM | Role |
+|-----|------|------|
+| Intel Arc A310 (Sparkle) | 4 GB | Jellyfin VA-API transcoding |
+| EVGA GeForce RTX 3060 XC 12 GB GDDR6 | 12 GB | Ollama local LLM inference |
+
+### Intel NUC — Home Office Node
+
+| Item | Value |
+|------|-------|
+| CPU | Intel i7 10th gen (~10 cores) |
+| RAM | 8 GB (upgradeable) |
+| Storage | 500 GB SSD + 2× 2 TB SSDs |
+| Role | Proxmox Backup Server + VPS recovery lab |
+
+### Proxmox Node Naming (Hitchhiker's Guide Theme)
+
+Existing nodes: `deep-thought`, `ford`, `marvin`, `zaphod`, `slartibartfast`, `hactar`, `magrathea`
+
+Planned: `trillian` (VMID 112) — Open WebUI + Ollama LXC on VLAN 60
+
+---
+
+## Three-Lane Operating Model
+
+All three lanes require access to a local model. Sensitive or private data must never leave the device.
+
+| Lane | Purpose | Primary devices | Frontier API allowed | Local model requirement |
+|------|---------|----------------|----------------------|------------------------|
+| **Enterprise / MSP** | ScreamSaver IT + AI-native MSP brand. Client data cleanup, CSV processing, password review, business docs | Mac + Z640 | Claude (Anthropic API) | Required — sensitive client data |
+| **Personal projects** | Creator work | Mac + Z640 | Claude + Gemini | Required — privacy |
+| **Private / personal** | Private inquiries, financial, health | Mac + Z640 | None | Hard requirement — no frontier API |
+
+**Practical implementation:** Open WebUI (self-hosted) with per-conversation model switching.
+Local Ollama serves the private lane. Claude API serves enterprise and personal project lanes.
+
+```mermaid
+graph TD
+    subgraph Lanes["Three-Lane Operating Model"]
+        direction TB
+        L1["Enterprise / MSP Lane\nClient data · CSV · business docs"]
+        L2["Personal Projects Lane\nCreator work · experiments"]
+        L3["Private / Personal Lane\nFinancial · health · personal"]
+    end
+
+    subgraph Models["Model Routing"]
+        FM1["Claude API\n(Anthropic Console)"]
+        FM2["Gemini API"]
+        LM["Local Ollama\nRTX 3060 / Mac Apple Silicon"]
+    end
+
+    L1 -->|"Allowed"| FM1
+    L2 -->|"Allowed"| FM1
+    L2 -->|"Allowed"| FM2
+    L3 -->|"Hard block — no frontier"| LM
+    L1 -->|"Required"| LM
+    L2 -->|"Required"| LM
+
+    style L3 fill:#7f1d1d,color:#fca5a5
+    style LM fill:#1e3a5f,color:#93c5fd
+```
+
+> **Open decision — OL-THREE-LANE-FORMALISE:** The three-lane model is a policy
+> concept. It is not yet technically enforced. Routing, retention boundaries, and migration
+> to permanent hardware are still undefined.
+
+---
+
+## Network Topology
+
+### WireGuard Hub-and-Spoke
+
+**Hub:** CHR01 (MikroTik Cloud Hosted Router)
+
+**Spokes:** VPS01, Mac workstation, Z640 (VLAN 60), NUC, client office connections
+
+**SSTP tunnels:** Client MikroTik routers (RouterOS v6) connect via SSTP to CHR01.
+This enables Winbox/SSH access to client routers and WireGuard-forwarded connectivity
+for client site access (e.g. POS RDP paths).
+
+```mermaid
+graph TB
+    subgraph Cloud["Cloud / VPS Layer"]
+        CHR01["CHR01\nMikroTik Cloud Hosted Router\nWireGuard hub · SSTP hub"]
+        VPS01["VPS01\nvps01.yourdomain.com\nForgejo · Caddy · Monitoring\nwg0: 10.0.12.20\nwg1: 10.33.33.1"]
+        CHR02["CHR02\n(Planned failover hub)"]
+    end
+
+    subgraph HomeLab["Home Lab"]
+        Z640["HP Z640\nProxmox VE\nOllama · RTX 3060\nVLAN 60 / 10.42.60.x"]
+        NUC["Intel NUC\nProxmox Backup Server\nVPS recovery lab"]
+    end
+
+    subgraph Operator["Operator"]
+        MAC["MacBook Pro M5\nClaude Code devcontainer\nLocal Ollama (privacy)"]
+    end
+
+    subgraph Clients["Client Sites (SSTP)"]
+        CR1["[MSP Client A]\nmultiple sites\nMikroTik RouterOS v6"]
+        CRN["Other client routers\n(MikroTik RouterOS v6)"]
+    end
+
+    CHR01 <-->|"WireGuard spoke"| VPS01
+    CHR01 <-->|"WireGuard spoke"| Z640
+    CHR01 <-->|"WireGuard spoke"| NUC
+    CHR01 <-->|"WireGuard spoke"| MAC
+    CHR01 <-->|"SSTP tunnel"| CR1
+    CHR01 <-->|"SSTP tunnel"| CRN
+    CHR01 -.->|"Planned failover"| CHR02
+
+    style CHR01 fill:#1a3a2a,color:#86efac
+    style CHR02 fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5
+    style VPS01 fill:#1e3a5f,color:#93c5fd
+```
+
+> **Open decision — OL-CHR-FAILOVER:** CHR01 is the single WireGuard and SSTP hub.
+> Three failover options under consideration:
+>
+> 1. Keep CHR01 + add CHR02 failover (same cloud provider risk remains)
+> 2. Move all tunnels to VPS01 (VPS01 becomes single point of failure for both services and tunnels)
+> 3. **Split roles:** CHR01 handles client SSTP, VPS01 handles home lab WireGuard
+
+### VPS Dual-Plane WireGuard Design
+
+VPS01 runs two WireGuard interfaces:
+
+| Interface | Subnet | Purpose |
+|-----------|--------|---------|
+| `wg0` | 10.0.12.x | Infrastructure lane — router reachability, SNMP polling, inter-site monitoring |
+| `wg1` | 10.33.33.x | Operator/client access — private access to platform services, internal DNS |
+
+Split DNS via Unbound on `wg1`: all `*.yourdomain.com` resolves over WireGuard.
+
+---
+
+## VLAN Scheme
+
+**Supernet:** `10.42.0.0/16`
+
+**Convention:** VLAN ID = third octet. VLAN 60 = `10.42.60.0/24`.
+
+| VLAN | Name | Subnet | Key hosts |
+|------|------|--------|-----------|
+| 60 | AI-Agents | 10.42.60.0/24 | `trillian` (VMID 112) — Open WebUI + Ollama LXC |
+
+---
+
+## VPS Platform Services
+
+| Service | Status | Notes |
+|---------|--------|-------|
+| Caddy | Live | All vhosts deployed |
+| WireGuard | Live | Dual-plane: wg0 infrastructure, wg1 operator/client |
+| Forgejo | Live | `git.yourdomain.com` |
+| Prometheus | Live | Scraping node + SNMP targets |
+| Grafana | Live | `monitoring.yourdomain.com` |
+| Alertmanager | Live | `alerts.yourdomain.com` — email pipeline confirmed (DKIM/SPF/DMARC pass) |
+| snmp_exporter | Live | MikroTik hAP ax³ active target |
+| node_exporter | Live | Native — VPS host metrics |
+| Loki | Live | VPS01, 30-day retention |
+| Grafana Alloy | Live | Shipping systemd journal + Docker logs to Loki |
+| Restic | Partial | Backup targets exist — restore validation not yet complete |
+| Business control plane | Direction reset | CRM, ticketing, PM selection reopened |
+| FreeRADIUS captive portal | Not started | Phase 5 |
+| FastAPI alert → ticket bridge | Not started | Depends on control-plane API design |
+
+---
+
+## AI Stack
+
+```mermaid
+graph TD
+    subgraph Users["Operator Access"]
+        MAC_UI["Mac — browser tab\nor home screen web app"]
+        PHONE["iPhone — WireGuard VPN\nbrowser shortcut"]
+        TABLET["Android tablet\n(Phase Later)"]
+    end
+
+    subgraph OpenWebUI["Open WebUI (trillian / VLAN 60)"]
+        direction TB
+        OW["Open WebUI\nSelf-hosted Docker container\nCaddy reverse proxy"]
+    end
+
+    subgraph ModelBackends["Model Backends"]
+        OLLAMA["Ollama\nRTX 3060 / 12 GB VRAM\nPrivate + enterprise lanes\n7B–13B class models"]
+        CLAUDE_API["Claude API\nAnthropic Console account\nEnterprise + personal project lanes"]
+        MAC_OLLAMA["Mac Ollama\nApple M5 / 24 GB unified\nPortable private inference"]
+    end
+
+    MAC_UI --> OW
+    PHONE --> OW
+    TABLET -.-> OW
+
+    OW --> OLLAMA
+    OW --> CLAUDE_API
+    OW -.-> MAC_OLLAMA
+
+    style OW fill:#1a3a2a,color:#86efac
+    style OLLAMA fill:#1e3a5f,color:#93c5fd
+    style CLAUDE_API fill:#3b1f6e,color:#c4b5fd
+    style MAC_OLLAMA fill:#3f2a10,color:#fdba74
+```
+
+### Model Routing Summary
+
+| Use case | Model | Lane |
+|----------|-------|------|
+| Client data processing | Claude API | Enterprise |
+| Business documentation | Claude API | Enterprise |
+| Personal project work | Claude API or Gemini API | Personal projects |
+| Private/personal queries | Local Ollama only | Private |
+| Financial or health queries | Local Ollama only | Private — hard requirement |
+
+---
+
+## AgentLab Orchestrator
+
+The orchestrator is a separate coordination layer above Open WebUI. It routes work between
+Claude Code, Codex CLI, and Gemini CLI and manages the multi-agent session substrate.
+
+> **Current status:** Orchestrator Phase 1 substantially complete. Shelved for full deployment until Z640 rebuild is complete.
+
+```mermaid
+graph TD
+    subgraph Orchestrator["AgentLab Orchestrator (Therapon)"]
+        SUPER["Supervisor\nClaude — agent branch\norchestrator.py"]
+        PLAN["Planner\nTask decomposition"]
+        WORK_C["Worker — Claude Code"]
+        WORK_X["Worker — Codex CLI\ncodex branch / worktree"]
+        WORK_G["Worker — Gemini CLI\ngemini branch / worktree"]
+        RESEARCH["Researcher\n(active)"]
+        VERIFY["Verifier\n(stub — not wired)"]
+        PRIV["Private-lane worker\n(stub — pending)"]
+    end
+
+    OP["Operator\niTerm2 + orchestrator profile"]
+
+    OP -->|"Prompt via orchestrator profile"| SUPER
+    SUPER --> PLAN
+    PLAN --> WORK_C
+    PLAN --> WORK_X
+    PLAN --> WORK_G
+    PLAN --> RESEARCH
+    PLAN -.->|"pending"| VERIFY
+    PLAN -.->|"pending"| PRIV
+
+    style SUPER fill:#1a3a2a,color:#86efac
+    style VERIFY fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5
+    style PRIV fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5
+```
+
+### Multi-Agent Git Model
+
+| Branch | Owner | Purpose |
+|--------|-------|---------|
+| `main` | Human only | Production — never commit directly |
+| `agent` | Claude (Supervisor) | Claude's working branch |
+| `codex` | Codex CLI | Codex's working branch |
+| `gemini` | Gemini CLI | Gemini's working branch |
+
+Human promotes `agent` → `main` via `./tools/promote.sh <tag>` after review.
+
+---
+
+## Chat and Voice Access
+
+| Device | Interface | Notes |
+|--------|-----------|-------|
+| Mac | Open WebUI — browser tab or home screen web app | Primary |
+| iPhone | Open WebUI — browser shortcut, WireGuard VPN on | VPN required |
+| Android tablet | Open WebUI — Phase Later | Planned |
+
+### Voice Input
+
+| Device | Tool | Status |
+|--------|------|--------|
+| Mac | SuperWhisper — local Parakeet model via WhisperKit, Apple Neural Engine | Active — all audio on-device |
+| iPhone | Needs research — Apple Dictate is insufficient | Open |
+
+---
+
+## Backup and Recovery
+
+| Layer | Target | Status |
+|-------|--------|--------|
+| Local Restic | VPS disk — fast-restore cache | Exists — restore validation not complete |
+| Offsite | Cloudflare R2 Standard | Active — intended primary DR copy |
+| Third target (3-2-1 completion) | Home lab (Z640/NUC) | Not active — blocked by Z640 rebuild |
+
+> **True 3-2-1 is not yet complete.** R2 is the current disaster-recovery copy.
+> Restore path has not been validated end-to-end.
+
+---
+
+## Open Architectural Decisions
+
+| Topic | Summary | Status |
+|-------|---------|--------|
+| WireGuard hub failover | CHR01 is a single hub. Three options (CHR02 same-provider, move to VPS01, split roles). No decision made. | Open |
+| VPS02 warm standby | Manual failover only. Sync method undefined. | Open |
+| Three-lane model enforcement | Policy concept — not technically enforced. Routing and retention boundaries undefined. | Open |
+| Business control-plane architecture | CRM, ticketing, PM selection reopened. Odoo not a settled answer. GLPI in scope for ticketing. Target: dashboard over multiple best-fit tools, not a monolithic app. | Open |
+| Cross-agent verification loop | One agent answers → second independently verifies → discrepancies surface before operator acts. | Not started |
+
+---
+
+## Security Boundaries
+
+| Boundary | Mechanism |
+|----------|-----------|
+| All external service access | WireGuard VPN required — no public-facing admin interfaces |
+| Secrets management | Ansible vault — never in git |
+| Agent execution boundary | No agent controls production execution directly — human Terminal gate for all Ansible runs |
+| Private lane data | Never routed to frontier APIs — local Ollama only |
+| SSH keys | Mac host only — not mounted in container |
--- a/agentlab/decisions/ADR-001-tool-persistence.md
+++ b/agentlab/decisions/ADR-001-tool-persistence.md
@ -0,0 +1,48 @@
+# ADR-001 — CLI Tool Persistence Pattern
+
+**Status:** Accepted
+
+## Context
+
+The AgentLab devcontainer uses Docker named volumes for config/auth persistence and bind mounts for the repo. Any tool installed manually into the container filesystem (e.g. via `npm install -g`) is lost on `docker compose down && up` because the writable container layer is discarded.
+
+This caused Codex to fail silently after a container restart. The `codex-config` named volume correctly preserved `~/.codex/` (auth, config), but the binary was never added to `bootstrap.sh`. The result: auth survived, binary didn't, tool appeared broken.
+
+Claude Code survived because its installer was added to `bootstrap.sh` at setup time. Codex was not.
+
+## Decision
+
+Every CLI tool that must survive container rebuilds requires **two** things — both are mandatory, neither is sufficient alone:
+
+| Part | What | Where |
+|------|------|-------|
+| 1. Named volume | Persists auth/config (`~/.TOOLNAME/`) | `compose.yml` |
+| 2. Bootstrap step | Installs the binary idempotently | `bootstrap.sh` |
+
+Missing either part = tool is half-set-up and will silently break on next rebuild.
+
+## Tool manifest
+
+| Tool | Binary | Named volume | Bootstrap step |
+|------|--------|--------------|----------------|
+| Claude Code | `claude` | `claude-config → ~/.claude` | ✓ |
+| Codex CLI | `codex` | `codex-config → ~/.codex` | ✓ |
+| Gemini CLI | `gemini` | `gemini-config → ~/.gemini` | ✓ |
+
+## Consequences
+
+- `bootstrap.sh` is the single executable source of truth for binary installation.
+- `compose.yml` is the single source of truth for named volumes.
+- Any agent setting up a new CLI tool MUST update both files AND this ADR before the setup is considered complete.
+
+## What does NOT need a bootstrap step
+
+Tools installed in the **Dockerfile** (`RUN` instruction) are baked into the image and survive all rebuilds without a bootstrap step. Only tools installed at runtime (post-container-start) need a bootstrap step.
+
+## Adding a new tool — checklist
+
+1. Add named volume to `compose.yml` (`TOOLNAME-config → ~/.TOOLNAME`)
+2. Add idempotent install step to `bootstrap.sh`
+3. Add entry to the tool manifest table above
+4. Update `docs/bootstrap.md` with the new tool
+5. Run `bootstrap.sh` to verify the install succeeds
--- a/agentlab/decisions/ADR-002-documentation-structure.md
+++ b/agentlab/decisions/ADR-002-documentation-structure.md
@ -0,0 +1,78 @@
+# ADR-002: Documentation Structure and Naming Conventions
+
+**Status:** Proposed — design agreed, migration pending dedicated session.
+
+## Problem
+
+The current `docs/` directory is a flat catch-all. Files with no clear taxonomy, inconsistent naming, and no indication of what kind of document they are or what project they belong to. Key symptoms:
+
+- `HANDOFF.md` — whose handoff? Only meaningful inside `vps/`
+- Architecture docs, dictation rollout docs, and agent reference docs all live in the same flat directory with no relationship visible from the name alone
+- Dated scratch files mixed with authoritative reference documents
+- No way to know, from a filename alone, what kind of document it is or what project it belongs to
+
+This causes agents and operators to either over-read (read everything to find the right document) or under-read (miss relevant documents entirely).
+
+## Decision
+
+### 1. Adopt Diátaxis as the documentation framework
+
+[Diátaxis](https://diataxis.fr) defines four document types, each serving a distinct need:
+
+| Type | Purpose | Reader need |
+|---|---|---|
+| **Reference** | Look up a fact | "What is the current state?" |
+| **How-to** | Follow a procedure | "How do I do X?" |
+| **Explanation** | Understand why | "Why is it this way?" |
+| **State** | Track live work | "What is in progress?" |
+
+State is not a Diátaxis type but is operationally distinct in a living infrastructure repo — it changes frequently and must be authoritative.
+
+### 2. Project-scoped structure
+
+Every significant system is a project with its own subdirectory. Project state files are named with their project prefix so they are self-describing without directory context.
+
+```
+projects/
+├── vps/
+│   ├── VPS-CURRENT-STATE.md
+│   ├── VPS-HANDOFF.md
+│   └── VPS-PLAN.md
+├── agentlab/
+│   ├── AGENTLAB-CURRENT-STATE.md
+│   └── AGENTLAB-HANDOFF.md
+```
+
+### 3. docs/ structure by Diátaxis type
+
+```
+docs/
+├── reference/       # Look-up: current facts, inventories, specs
+├── how-to/          # Procedures: step-by-step, operator runbooks
+├── explanation/     # Why: architecture rationale, design context
+├── decisions/       # ADRs: point-in-time architecture decisions
+├── open-loops.md    # Active cross-project work tracking
+├── deploys/         # Timestamped deploy logs
+└── archive/         # Historical records
+```
+
+### 4. Naming conventions
+
+| Rule | Example |
+|---|---|
+| Project state files are prefixed with project name | `VPS-HANDOFF.md` |
+| All filenames are lowercase-hyphenated | `agent-model-reference.md` |
+| ADRs are numbered and describe the decision | `ADR-002-documentation-structure.md` |
+| Service docs match the service name exactly | `prometheus.md`, `loki.md` |
+| Avoid dates in non-archive filenames | ~~`2026-03-25-plan.md`~~ |
+
+## Migration rules
+
+1. Do not migrate files piecemeal — partial migration breaks the required reading chain
+2. Update all cross-references atomically in the same commit as file moves
+3. Run a grep for every moved filename before committing to catch stale references
+4. Migration is a dedicated session — do not start during active build work
+
+## Why not a flat structure
+
+A flat `docs/` directory forces readers (human and agent) to scan all files to find the relevant document. Naming cannot communicate type, scope, or authority. The Diátaxis structure makes type and scope visible from the path alone: `docs/how-to/whisperkit-dictation-rollout.md` tells you exactly what it is before you open it.
--- a/agentlab/decisions/README.md
+++ b/agentlab/decisions/README.md
@ -0,0 +1,8 @@
+# Architectural Decision Records
+
+Point-in-time decisions that are hard to reverse or affect multiple parts of the system.
+
+| ADR | Decision |
+|-----|---------|
+| [ADR-001-tool-persistence.md](ADR-001-tool-persistence.md) | Every CLI tool needs two things: a named volume (auth/config) AND a bootstrap install step (binary). Missing either = silent breakage on rebuild. |
+| [ADR-002-documentation-structure.md](ADR-002-documentation-structure.md) | Diátaxis-based directory structure, project-scoped naming, and migration rules for the docs/ directory. |
--- a/agentlab/homelab-build-plan.md
+++ b/agentlab/homelab-build-plan.md
@ -0,0 +1,244 @@
+# Home Lab Build Plan — HP Z640
+
+## Hardware
+
+| Component | Detail |
+|---|---|
+| **System** | HP Z640 Workstation |
+| **CPU** | Intel Xeon (workstation class) |
+| **RAM** | 64 GB ECC |
+| **OS Storage** | 2× Samsung 850 EVO 500 GB — ZFS mirror (`rpool`) |
+| **Data Storage** | 4× Seagate 2 TB — ZFS RAIDZ2 encrypted (data pool) |
+| **GPU 1** | Intel Arc A310 (Sparkle) 4 GB — Jellyfin VA-API transcoding |
+| **GPU 2** | EVGA GeForce RTX 3060 XC 12 GB GDDR6 — Ollama local LLM inference |
+| **Current state** | Proxmox VE installed, organic/messy config — scheduled for clean rebuild |
+
+---
+
+## Phase Overview
+
+```mermaid
+gantt
+    title HP Z640 Rebuild — Phase Sequence
+    dateFormat  YYYY-MM-DD
+    axisFormat  Phase
+
+    section Prerequisite
+    Phase 0a — Pre-audit (SSH)          :crit, p0a, 2026-04-01, 1d
+    Phase 0b — USB backup               :crit, p0b, after p0a, 1d
+
+    section Core Build
+    Phase 1 — Proxmox clean install     :crit, p1, after p0b, 2d
+    Phase 2 — Core infrastructure LXCs  :p2, after p1, 2d
+
+    section Services
+    Phase 3 — Media stack               :p3, after p2, 2d
+    Phase 4a — Networking + security    :p4a, after p3, 1d
+    Phase 4b — Agent stack (trillian)   :p4b, after p4a, 2d
+
+    section Automation
+    Phase 5 — IaC + automation          :p5, after p4b, 3d
+```
+
+---
+
+## Phase 0a: Pre-Audit
+
+> **GATE — Nothing proceeds until this is complete.**
+
+Capture the current state of the Z640 before any destructive action. The rebuild will wipe LXC and VM configuration.
+
+**Scope:**
+- [ ] ZFS pool layout (`rpool` mirror + data pool RAIDZ2) — names, health, encryption status
+- [ ] VM and LXC inventory — all IDs, names, disk sizes, network config
+- [ ] Arr stack config and data paths (Sonarr, Radarr, Prowlarr, etc.)
+- [ ] Jellyfin config path and media library paths
+- [ ] PBS datastore paths and retention config
+- [ ] Network config — bridges, VLANs, IP assignments
+- [ ] Cron jobs — all scheduled tasks
+- [ ] Running services summary
+
+---
+
+## Phase 0b: USB Backup
+
+> **GATE — USB backup must complete before Phase 1. No exceptions.**
+
+Full backup of the ZFS data pool to external USB before any rebuild touches storage.
+
+- [ ] Attach external USB drive to Z640
+- [ ] Verify USB drive capacity (must exceed used space on data pool)
+- [ ] Export pool snapshot and send to USB
+
+```bash
+# Capture used space first
+zpool list
+zfs list
+
+# Send encrypted data pool to USB (adjust pool/dataset names from audit output)
+zfs snapshot datapool@pre-rebuild
+zfs send -R datapool@pre-rebuild | pv > /mnt/usb/datapool-pre-rebuild.zfs
+
+# Verify send completed without error
+echo "Exit code: $?"
+```
+
+---
+
+## Phase 1: Proxmox Clean Install
+
+> **GATE — Phase 0a audit complete. Phase 0b USB backup verified.**
+
+Fresh Proxmox VE install. Import existing ZFS pools. Establish baseline network config.
+
+- [ ] Download latest stable Proxmox VE ISO
+- [ ] Write ISO to USB installer
+- [ ] Boot Z640 from installer USB
+- [ ] Install Proxmox VE — **do not touch the data pool disks**
+- [ ] Import data pool:
+
+```bash
+zpool import -f datapool
+zfs load-key datapool
+zfs mount -a
+```
+
+- [ ] Verify pool health: `zpool status && zfs list`
+
+### Network Config
+
+VLAN scheme: `10.42.0.0/16` supernet. VLAN ID = third octet of the subnet.
+
+| VLAN ID | Subnet | Purpose |
+|---|---|---|
+| 10 | 10.42.10.0/24 | Management |
+| 20 | 10.42.20.0/24 | LAN / trusted devices |
+| 60 | 10.42.60.0/24 | AI-Agents |
+
+---
+
+## Phase 2: Core Infrastructure LXCs
+
+> **GATE — Proxmox clean install complete. ZFS pools healthy.**
+
+### 2a — PBS LXC (Proxmox Backup Server)
+
+- [ ] Create LXC for PBS (unprivileged, Debian base)
+- [ ] Assign a datastore path on the data pool
+- [ ] Configure PBS retention policy
+- [ ] Register PBS in Proxmox
+- [ ] Test backup of a throwaway LXC
+
+### 2b — WireGuard LXC
+
+- [ ] Create LXC for WireGuard
+- [ ] Install WireGuard
+- [ ] Configure as spoke to CHR01
+
+### 2c — Monitoring LXC
+
+- [ ] Create LXC for monitoring stack
+- [ ] Install Prometheus + Grafana
+- [ ] Add Proxmox node as scrape target
+- [ ] Basic dashboard: CPU, RAM, ZFS pool health, network
+
+---
+
+## Phase 3: Media Stack
+
+> **GATE — Phase 2 complete. ZFS data pool mounted and healthy.**
+
+### 3a — Jellyfin LXC with Intel Arc A310
+
+- [ ] Create LXC (privileged — required for GPU passthrough)
+- [ ] Pass through Intel Arc A310 via IOMMU / device passthrough
+- [ ] Install Jellyfin
+- [ ] Bind-mount media library paths from ZFS data pool
+- [ ] Configure VA-API hardware transcoding
+
+```bash
+# Verify VA-API inside LXC
+vainfo
+# Expected: shows Intel iHD driver, H264/HEVC encode/decode profiles
+```
+
+### 3b — Arr Stack LXCs or Docker
+
+- [ ] Determine migration target: individual LXCs or single Docker LXC
+- [ ] Restore arr config from paths captured in audit
+- [ ] Verify indexer connectivity (Prowlarr)
+- [ ] Verify download client connectivity
+- [ ] Verify library scan in Sonarr/Radarr against restored media paths
+
+---
+
+## Phase 4a: Networking + Security
+
+> **GATE — Media stack verified functional.**
+
+- [ ] All LXCs assigned to correct VLANs
+- [ ] Proxmox firewall rules: deny inter-VLAN by default, permit explicitly
+- [ ] VLAN 60 (AI-Agents) isolated — only permitted outbound: DNS, HTTPS, WireGuard tunnel
+- [ ] WireGuard tunnel to CHR01 confirmed up and passing traffic
+
+---
+
+## Phase 4b: Agent Stack — Open WebUI (LXC: `trillian`, VMID 112, VLAN 60)
+
+> **GATE — Phase 4a network config complete. VLAN 60 operational.**
+
+**Goal:** Deploy Open WebUI backed by Ollama on the RTX 3060.
+
+### Architecture
+
+```mermaid
+flowchart TD
+    User["User (VPN connected)"]
+    VPS01["VPS01\nCaddy reverse proxy\ntherapon.yourdomain.com"]
+    WG["WireGuard tunnel\nCHR01 ↔ trillian"]
+    Caddy["Caddy (trillian LXC)\nInternal reverse proxy"]
+    WebUI["Open WebUI\nDocker container"]
+    Ollama["Ollama\nDocker container"]
+    GPU["RTX 3060 XC 12 GB\nGPU passthrough"]
+
+    User --> VPS01
+    VPS01 --> WG
+    WG --> Caddy
+    Caddy --> WebUI
+    WebUI --> Ollama
+    Ollama --> GPU
+```
+
+### Tasks
+
+- [ ] Create privileged LXC `trillian` — VMID 112, VLAN 60, Debian 12
+- [ ] Pass through EVGA RTX 3060 via IOMMU
+- [ ] Install Docker inside LXC
+- [ ] Verify GPU visible inside LXC: `nvidia-smi`
+- [ ] Deploy Ollama container with GPU passthrough
+- [ ] Deploy Open WebUI container
+- [ ] Configure Caddy reverse proxy
+- [ ] Test end-to-end: VPN on, browser to internal URL, model inference working
+
+---
+
+## Phase 5: IaC + Automation
+
+> **GATE — Full stack deployed and verified functional.**
+
+- [ ] Configure Terraform Proxmox provider (`bpg/proxmox`)
+- [ ] Write Terraform modules for LXC and VM templates
+- [ ] Import existing LXCs into Terraform state
+- [ ] Write Ansible playbooks for LXC configuration
+- [ ] Deploy HashiCorp Vault LXC
+- [ ] Migrate secrets from manual config to Vault
+
+---
+
+## Future Considerations (Not in Scope)
+
+| Item | Notes |
+|------|-------|
+| UPS (APC or similar) | Worthwhile — deferred beyond Phase 5 |
+| Second NIC for dedicated storage network | Optional optimisation |
+| GPU upgrade beyond RTX 3060 | Not needed at current model sizes |
--- a/agentlab/overview.md
+++ b/agentlab/overview.md
@ -0,0 +1,141 @@
+# AgentLab Overview
+
+## The Plain Version
+
+This is a private AI-assisted control plane — a system that lets one person run a small IT business, manage infrastructure, and handle projects with AI agents doing the heavy lifting inside a safe, bounded workspace.
+
+The simplest analogy: imagine a workshop where the bench is a server, the notebook is a Git repository, and instead of one assistant, there are three AI builders (Claude, Codex, and Gemini) who each read the same notebook before they start work and write everything they do back into it. The notebook is the memory — not the chat history, not anyone's head.
+
+**What's actually running right now:**
+
+- A VPS in the cloud running a proper monitoring stack — Prometheus, Grafana, Alertmanager — that watches network equipment and sends email alerts when things break.
+- WireGuard VPN connecting the private services so they're not exposed to the public internet.
+- Forgejo (a private Git server) running at `git.yourdomain.com` — the version control system for all the code and config.
+- Loki + Grafana Alloy collecting logs from the server (system logs, Docker container logs) — live but still being made useful. The data is flowing; the analysis layer is the next step.
+- The AI workspace itself: Claude, Codex, and Gemini running inside a Docker container, all reading from and writing to the same Git repo. Each one has its own rules, its own branch, and its own lane of work.
+
+**What's coming next:**
+
+- A proper orchestrator that routes tasks to the right AI agent automatically and knows when each one has run out of budget or hit its usage window.
+- Better log-based monitoring — turning the Loki data into something you can actually act on: search logs from routers, correlate network events, alert on log patterns.
+- Mobile/phone access — being able to check on the system or give it a task from your phone, not just from a laptop.
+- A private dashboard that will eventually be the single operator view over infrastructure, clients, and projects.
+- Business platform layer — ticketing, CRM, and project management tools, to be decided. The philosophy is: one tight control surface over multiple best-fit tools, not one giant app trying to do everything.
+
+**The bigger picture:**
+
+The name for this overall direction is Therapon. The idea is an operator-owned system — not dependent on any single vendor, not locked into a subscription SaaS stack — that a small team or solo operator can actually run and trust. The AI agents are the workers. The repo is the brain. The infrastructure is owned, not rented from someone else's control plane.
+
+It's early. The foundation is real and working. The next chapter is turning a collection of good parts into a coherent, orchestrated system.
+
+---
+
+## What This System Is (Technical Detail)
+
+AgentLab is a way to use AI coding agents inside a bounded development workspace instead of letting them operate directly on a host machine. In this repo, that workspace is a Docker devcontainer running Ubuntu, with the main Git repository mounted into it at `/workspace/vault-repo`. Claude Code, Codex, and Gemini run inside that container and work against the repo.
+
+The important idea is not "use VS Code." VS Code is only one possible client for attaching to the container. You can also enter the container with `docker exec`, a terminal profile, JetBrains remote tooling, or any other workflow you prefer.
+
+The repo acts as the control plane. Instructions, rules, state documents, handoff notes, plans, scripts, and generated outputs all live in Git. The agents do not rely on chat history as the source of truth. They read the repo, operate inside the container, and write changes back to the repo.
+
+This is a practical middle ground between two weaker models:
+
+- letting an agent run loosely on a laptop with unclear boundaries
+- treating chat transcripts as the project memory
+
+Instead, the durable memory is the repository, and the execution boundary is the container.
+
+## Why It Works
+
+### Persistence
+
+The repo itself is bind-mounted from the host into the container, so the working tree survives container rebuilds. Tool authentication and per-agent config are stored in Docker named volumes rather than in the writable container layer:
+
+- `claude-config` mounted at `~/.claude`
+- `codex-config` mounted at `~/.codex`
+- `gemini-config` mounted at `~/.gemini`
+
+That means the agent binaries can be replaced or the container can be rebuilt without losing agent login state and settings.
+
+The repo also codifies an important lesson in [decisions/ADR-001-tool-persistence.md](decisions/ADR-001-tool-persistence.md): persisting auth is not enough. Each tool also needs a bootstrap install step so the binary is present after rebuilds.
+
+### Isolation
+
+The container is a bounded workspace. The agents can see what is mounted into `/workspace`, not the whole host. Secrets are intentionally kept outside the repo.
+
+That gives you a cleaner security and operational model:
+
+- the host stays thin
+- the repo is the shared memory
+- the container is the execution environment
+- production or infrastructure actions can stay behind a human approval gate
+
+### Rules and Required Reading
+
+AgentLab relies on documented agent rules rather than trusting the model to "just behave." `AGENTS.md` defines cross-agent rules for Claude, Codex, and Gemini, and `LLM-ENTRYPOINT.md` defines the mandatory read order and execution boundaries.
+
+The required reading order matters. It forces each session to rehydrate itself from the same durable context before acting. That reduces drift between sessions and between different agents.
+
+The pattern:
+
+1. Start from a single repo entrypoint document.
+2. Read the current system plan and open loops.
+3. Read current operational state and handoff docs.
+4. Only then begin work.
+
+### Hooks and Guardrails
+
+For Claude Code specifically, this repo uses hook scripts and settings under `.claude/` to block dangerous operations and remind the agent about boundaries. The concept is portable: combine agent instructions with mechanical guardrails where the tool supports them.
+
+### Bootstrap and Image Design
+
+Two layers to the workspace:
+
+- the image layer, defined by the Dockerfile
+- the runtime repair layer, defined by `bootstrap.sh`
+
+The Dockerfile bakes in the base OS packages and the agent CLIs. `bootstrap.sh` provides an idempotent way to verify or restore what must exist in the runtime environment. That combination makes rebuilds predictable and recoverable.
+
+## How To Replicate It on Windows or Linux
+
+The portable version of this setup does not require macOS, Forgejo, or VS Code.
+
+### 1. A Single Bounded Workspace Container
+
+Use Docker Desktop on Windows or standard Docker Engine on Linux. Build one long-lived Ubuntu-based development container that sleeps when idle and serves as the workspace for human and agent sessions.
+
+You can use a devcontainer definition or plain `docker compose` plus `docker exec`. VS Code with the Dev Containers extension is a common option, but it is optional.
+
+### 2. A Git Repo as the Control Plane
+
+Put your instructions, plans, handoffs, task notes, and helper scripts in the repo. Make the repo the thing every agent reads before it works. Avoid treating chat as the system of record.
+
+For Windows or Linux, GitHub is the obvious replacement for Forgejo. The workflow stays the same.
+
+### 3. Persistent Agent State
+
+Create named volumes for each agent's auth and config directory. Also define a bootstrap script that reinstalls or verifies the required CLIs after rebuilds. Without that, you get partial persistence: config survives but binaries disappear.
+
+### 4. A Clear Mount Model
+
+Mount only what the agents need:
+
+- main repo as read-write
+- optional scratch/work directory as read-write
+- optional intake/reference directory as read-only
+- logs/output directory as read-write
+
+Keep secrets out of the repo. Inject them through environment variables, Docker secrets, or host-managed files outside the mounted workspace.
+
+### 5. CI That Mirrors the Repo Workflow
+
+Use GitHub Actions (or equivalent) to automate linting, validation, and policy checks on pushes and pull requests so the repo remains trustworthy as the control plane.
+
+## Key Design Principles
+
+- **Container bounds the workspace** — agents can't reach outside it without explicit mounts
+- **Git stores the memory and the rules** — not chat, not agent memory APIs
+- **Agents read before acting** — mandatory read order on every session start
+- **Persistence is designed, not assumed** — named volumes + bootstrap, not container overlay
+- **Risky actions stay behind explicit boundaries** — human gate for production execution
+- **Rules in the repo, not in the model** — documented operating contracts, enforced by hooks where possible
--- a/agentlab/rules/README.md
+++ b/agentlab/rules/README.md
@ -0,0 +1,11 @@
+# Agent Operating Rules
+
+> These rules are written for Claude Code but the concepts apply to any AI agent system.
+
+Rules that govern how agents behave in this repo. These live in `.claude/rules/` and are automatically loaded by Claude Code at session start.
+
+| Rule | What it covers |
+|------|---------------|
+| [tool-research-protocol.md](tool-research-protocol.md) | Mandatory research sequence before recommending any tool, library, or product — fetch official sources, verify repo health, check platform-specific performance |
+| [proactive-engineering.md](proactive-engineering.md) | Pre-task consistency checks, post-fix propagation checks, the persistence layer question, new tool checklist |
+| [readme-maintenance.md](readme-maintenance.md) | The hybrid README pattern (human README + agent entrypoint), required sections, when to update |
--- a/agentlab/rules/proactive-engineering.md
+++ b/agentlab/rules/proactive-engineering.md
@ -0,0 +1,49 @@
+# Proactive Engineering Behaviour
+
+> These rules are written for Claude Code but the concepts apply to any AI agent system.
+
+## Before Any Task: System Consistency Check
+
+Before starting any work, scan for known failure patterns and surface them explicitly. Do not wait for the operator to discover them. A good engineer notices the thing adjacent to the reported problem.
+
+Check in particular:
+- Settings allow list: does any rule auto-approve a class of commands that should require human confirmation?
+- Hook chain: is every hook wired into settings AND executable?
+- Tool manifest: does every tool listed have a named volume in compose.yml?
+- compose.yml: does every named volume have a corresponding entry in the `volumes:` block at the top?
+- Dockerfile: does every tool in the manifest appear in the Dockerfile?
+
+## After Any Fix: Propagation Check
+
+After fixing a problem, ask: "what else has this same underlying issue?"
+
+Examples:
+- Fixed a missing volume for Codex → check Gemini has one too
+- Fixed a broken hook → check all hooks are executable and wired
+- Fixed a stale doc → check all docs referencing the same system state
+
+## The "What Layer Does This Live On?" Question
+
+Before suggesting any install or configuration step inside a running container, explicitly state which Docker layer it targets:
+
+| Layer | Survives | Notes |
+|-------|----------|-------|
+| Image (Dockerfile RUN) | Everything | Requires image rebuild to change |
+| Named volume | Restart + rebuild, NOT `compose down -v` | Auth/config persistence |
+| Bind mount | Everything | Host filesystem |
+| Container overlay | Nothing — EPHEMERAL | Wrong layer for anything durable |
+
+If the answer is "container overlay": stop. That is the wrong layer.
+
+## New Tool Checklist (from ADR-001)
+
+No new CLI tool is complete until ALL of these are done:
+
+1. [ ] Binary baked into Dockerfile (RUN line)
+2. [ ] Named volume defined in compose.yml volumes block
+3. [ ] Named volume mounted to auth directory in service definition
+4. [ ] Tool listed in bootstrap manifest
+5. [ ] Bootstrap script has a "Found / Not found" check for the binary
+6. [ ] Documentation updated with tool entry
+
+Partial completion = broken. A tool with binary but no volume loses auth on compose down. A tool with volume but no binary vanishes on image rebuild.
--- a/agentlab/rules/readme-maintenance.md
+++ b/agentlab/rules/readme-maintenance.md
@ -0,0 +1,47 @@
+# README Maintenance
+
+> These rules are written for Claude Code but the concepts apply to any AI agent system.
+
+## The Pattern: Human README + Agent Entrypoint
+
+Every repo in this system follows a hybrid pattern:
+
+- `README.md` — human-facing landing page (GitHub/Forgejo renders this first)
+- `LLM-ENTRYPOINT.md` (or equivalent) — agent-facing startup chain
+
+These are not duplicates. They serve different audiences and must both exist. Never collapse them into one file. Never let README.md become agent instructions.
+
+## README.md Must Contain
+
+1. **What this repo is** — one paragraph, plain English
+2. **Current state table** — key services/components and their live status
+3. **Key documents table** — links to the most important docs an operator needs
+4. **Repo structure** — top-level directory layout with one-line descriptions
+5. **Branch model** — who owns which branch and its purpose
+6. **Agent entry point note** — a single line at the bottom: `*Agent entry point: LLM-ENTRYPOINT.md*`
+
+## When to Update README.md
+
+Update README.md when **state changes**, not when work happens:
+
+| Trigger | README update required |
+|---------|----------------------|
+| A service goes live or is decommissioned | Yes — update current state table |
+| A new key document is created | Yes — add to key documents table |
+| Repo top-level structure changes | Yes — update structure section |
+| Existing state entry becomes stale | Yes — update it |
+| Routine commit (config fix, doc edit) | No |
+| Bug fix or minor feature work | No |
+
+**Rule:** If the commit message would reasonably include the word "add", "remove", or "complete" for a system component or major document, check whether the README current state table needs updating.
+
+## When Creating a New Repo
+
+Before the first meaningful commit, write a README.md with all five sections above. A stub README is not acceptable as a permanent state.
+
+## What README.md Must NOT Contain
+
+- Agent startup chains or read-order instructions (those go in LLM-ENTRYPOINT.md)
+- Secrets, tokens, or credential paths
+- Internal agent coordination details
+- Architecture deep-dives (link to the relevant doc instead)
--- a/agentlab/rules/tool-research-protocol.md
+++ b/agentlab/rules/tool-research-protocol.md
@ -0,0 +1,63 @@
+# Tool and Product Research Protocol
+
+> These rules are written for Claude Code but the concepts apply to any AI agent system.
+
+## The Core Rule
+
+Before recommending any software tool, app, library, service, or vendor product, you MUST complete the research sequence below. This is not optional. Do not answer from training data alone. Training data is stale by definition.
+
+This rule fires for: CLI tools, Mac apps, mobile apps, SaaS products, libraries, open source projects, cloud services, vendor hardware/software.
+
+---
+
+## Mandatory Research Sequence
+
+### Step 1 — Official source only for pricing and features
+
+Fetch the actual product or project page. Do not characterise pricing, free tier limits, or feature availability from third-party sites (AlternativeTo, ProductHunt, G2, review aggregators). These are frequently wrong or stale.
+
+Required fetch: the official website or official GitHub README.
+
+### Step 2 — Repo health (for OSS tools)
+
+Before recommending any open source project:
+
+- When was the last commit? Flag if >6 weeks ago.
+- Are there open issues with no maintainer response in the last 30 days?
+- Does the current release have unresolved regressions in open issues?
+- Star/fork count — flag if under 500 stars (small project, fragility risk).
+
+If any are red flags: disqualify. Do not recommend with a caveat — disqualify.
+
+### Step 3 — Target OS / platform verification
+
+Before recommending any tool for a specific operating system or platform:
+
+- Search explicitly for `[tool] [OS] slow` and `[tool] [OS] not working` in the issue tracker and any available community sources.
+- For cross-platform / Electron apps targeting macOS: treat macOS support as **unverified until confirmed**. Cross-platform does not mean works-everywhere.
+- For GPU-accelerated tools: confirm the packaged binary has acceleration compiled in for the target platform (Metal on macOS, CUDA on Linux/Windows). A binary without GPU support will be unusably slow on workloads that require it.
+- If real-world performance on the target OS cannot be confirmed: say so explicitly before recommending.
+
+### Step 4 — Counter-argument
+
+After forming a recommendation, explicitly state the strongest argument against it. One paragraph. Not "on the other hand" — a genuine stress test of the recommendation.
+
+If the counter-argument is strong enough to change the recommendation, change it.
+
+---
+
+## Epistemic Labelling
+
+When making factual claims in a response, tag the source:
+
+- `[verified: URL]` — fetched and confirmed in this session
+- `[training data — unverified]` — from model training, not verified against current sources
+- `[inferred]` — logical inference, not directly sourced
+
+This makes epistemic status visible and forces you to notice when you are asserting from unverified training data.
+
+---
+
+## The Right Answer First
+
+If a well-maintained commercial option clearly solves the problem and OSS alternatives are fragile, unverified, or platform-mismatched — say so directly and early. Do not send the operator through OSS alternatives to avoid mentioning a paid option. Time wasted on broken tools costs more than the price of the right tool.