Architecture, overview, homelab build plan, agent handbook, ADRs, and agent operating rules. All sensitive operational details sanitized (real IPs, hostnames, client names replaced with generic placeholders). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
13 KiB
AgentLab System Architecture
Scope: Full system map — hardware, networking, AI stack, three-lane operating model, and open architectural decisions.
Contents
- Hardware Inventory
- Three-Lane Operating Model
- Network Topology
- VLAN Scheme
- VPS Platform Services
- AI Stack
- AgentLab Orchestrator
- Chat and Voice Access
- Backup and Recovery
- Open Architectural Decisions
Hardware Inventory
| Node | Type | Status | Role |
|---|---|---|---|
| MacBook Pro M5 24 GB | Operator workstation | Active | Primary operator machine, Claude Code devcontainer, local Ollama (privacy), portable to client sites |
| HP Z640 64 GB RAM | Home server | Needs rebuild | Proxmox VE host — agent stack, media server, persistent Ollama inference (RTX 3060) |
| Intel NUC (home office) | Small server | Active | Proxmox Backup Server, VPS recovery testing lab |
| CHR01 | MikroTik Cloud Hosted Router | Active | WireGuard hub (all home lab spokes), SSTP hub for client MikroTik routers |
| CHR02 | MikroTik Cloud Hosted Router | Planned | Failover hub — design undefined |
| VPS01 | VPS | Active | Forgejo, Caddy reverse proxy, monitoring stack, WireGuard spoke |
| VPS02 | VPS | Partial | Warm standby for VPS01 — manual failover only, sync method undefined |
| Client routers | MikroTik RouterOS v6 | Active | SSTP tunnels to CHR01 ([MSP Client A] — multiple sites) |
| Android tablet | Mobile | Phase Later | Dashboard, demo device |
MacBook Pro M5 — Workstation Baseline
| Item | Value |
|---|---|
| Chip | Apple M5 |
| CPU cores | 10 total — 4 performance, 6 efficiency |
| Unified memory | 24 GB |
HP Z640 — GPU Inventory
| GPU | VRAM | Role |
|---|---|---|
| Intel Arc A310 (Sparkle) | 4 GB | Jellyfin VA-API transcoding |
| EVGA GeForce RTX 3060 XC 12 GB GDDR6 | 12 GB | Ollama local LLM inference |
Intel NUC — Home Office Node
| Item | Value |
|---|---|
| CPU | Intel i7 10th gen (~10 cores) |
| RAM | 8 GB (upgradeable) |
| Storage | 500 GB SSD + 2× 2 TB SSDs |
| Role | Proxmox Backup Server + VPS recovery lab |
Proxmox Node Naming (Hitchhiker's Guide Theme)
Existing nodes: deep-thought, ford, marvin, zaphod, slartibartfast, hactar, magrathea
Planned: trillian (VMID 112) — Open WebUI + Ollama LXC on VLAN 60
Three-Lane Operating Model
All three lanes require access to a local model. Sensitive or private data must never leave the device.
| Lane | Purpose | Primary devices | Frontier API allowed | Local model requirement |
|---|---|---|---|---|
| Enterprise / MSP | ScreamSaver IT + AI-native MSP brand. Client data cleanup, CSV processing, password review, business docs | Mac + Z640 | Claude (Anthropic API) | Required — sensitive client data |
| Personal projects | Creator work | Mac + Z640 | Claude + Gemini | Required — privacy |
| Private / personal | Private inquiries, financial, health | Mac + Z640 | None | Hard requirement — no frontier API |
Practical implementation: Open WebUI (self-hosted) with per-conversation model switching. Local Ollama serves the private lane. Claude API serves enterprise and personal project lanes.
graph TD
subgraph Lanes["Three-Lane Operating Model"]
direction TB
L1["Enterprise / MSP Lane\nClient data · CSV · business docs"]
L2["Personal Projects Lane\nCreator work · experiments"]
L3["Private / Personal Lane\nFinancial · health · personal"]
end
subgraph Models["Model Routing"]
FM1["Claude API\n(Anthropic Console)"]
FM2["Gemini API"]
LM["Local Ollama\nRTX 3060 / Mac Apple Silicon"]
end
L1 -->|"Allowed"| FM1
L2 -->|"Allowed"| FM1
L2 -->|"Allowed"| FM2
L3 -->|"Hard block — no frontier"| LM
L1 -->|"Required"| LM
L2 -->|"Required"| LM
style L3 fill:#7f1d1d,color:#fca5a5
style LM fill:#1e3a5f,color:#93c5fd
Open decision — OL-THREE-LANE-FORMALISE: The three-lane model is a policy concept. It is not yet technically enforced. Routing, retention boundaries, and migration to permanent hardware are still undefined.
Network Topology
WireGuard Hub-and-Spoke
Hub: CHR01 (MikroTik Cloud Hosted Router)
Spokes: VPS01, Mac workstation, Z640 (VLAN 60), NUC, client office connections
SSTP tunnels: Client MikroTik routers (RouterOS v6) connect via SSTP to CHR01. This enables Winbox/SSH access to client routers and WireGuard-forwarded connectivity for client site access (e.g. POS RDP paths).
graph TB
subgraph Cloud["Cloud / VPS Layer"]
CHR01["CHR01\nMikroTik Cloud Hosted Router\nWireGuard hub · SSTP hub"]
VPS01["VPS01\nvps01.yourdomain.com\nForgejo · Caddy · Monitoring\nwg0: 10.0.12.20\nwg1: 10.33.33.1"]
CHR02["CHR02\n(Planned failover hub)"]
end
subgraph HomeLab["Home Lab"]
Z640["HP Z640\nProxmox VE\nOllama · RTX 3060\nVLAN 60 / 10.42.60.x"]
NUC["Intel NUC\nProxmox Backup Server\nVPS recovery lab"]
end
subgraph Operator["Operator"]
MAC["MacBook Pro M5\nClaude Code devcontainer\nLocal Ollama (privacy)"]
end
subgraph Clients["Client Sites (SSTP)"]
CR1["[MSP Client A]\nmultiple sites\nMikroTik RouterOS v6"]
CRN["Other client routers\n(MikroTik RouterOS v6)"]
end
CHR01 <-->|"WireGuard spoke"| VPS01
CHR01 <-->|"WireGuard spoke"| Z640
CHR01 <-->|"WireGuard spoke"| NUC
CHR01 <-->|"WireGuard spoke"| MAC
CHR01 <-->|"SSTP tunnel"| CR1
CHR01 <-->|"SSTP tunnel"| CRN
CHR01 -.->|"Planned failover"| CHR02
style CHR01 fill:#1a3a2a,color:#86efac
style CHR02 fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5
style VPS01 fill:#1e3a5f,color:#93c5fd
Open decision — OL-CHR-FAILOVER: CHR01 is the single WireGuard and SSTP hub. Three failover options under consideration:
- Keep CHR01 + add CHR02 failover (same cloud provider risk remains)
- Move all tunnels to VPS01 (VPS01 becomes single point of failure for both services and tunnels)
- Split roles: CHR01 handles client SSTP, VPS01 handles home lab WireGuard
VPS Dual-Plane WireGuard Design
VPS01 runs two WireGuard interfaces:
| Interface | Subnet | Purpose |
|---|---|---|
wg0 |
10.0.12.x | Infrastructure lane — router reachability, SNMP polling, inter-site monitoring |
wg1 |
10.33.33.x | Operator/client access — private access to platform services, internal DNS |
Split DNS via Unbound on wg1: all *.yourdomain.com resolves over WireGuard.
VLAN Scheme
Supernet: 10.42.0.0/16
Convention: VLAN ID = third octet. VLAN 60 = 10.42.60.0/24.
| VLAN | Name | Subnet | Key hosts |
|---|---|---|---|
| 60 | AI-Agents | 10.42.60.0/24 | trillian (VMID 112) — Open WebUI + Ollama LXC |
VPS Platform Services
| Service | Status | Notes |
|---|---|---|
| Caddy | Live | All vhosts deployed |
| WireGuard | Live | Dual-plane: wg0 infrastructure, wg1 operator/client |
| Forgejo | Live | git.yourdomain.com |
| Prometheus | Live | Scraping node + SNMP targets |
| Grafana | Live | monitoring.yourdomain.com |
| Alertmanager | Live | alerts.yourdomain.com — email pipeline confirmed (DKIM/SPF/DMARC pass) |
| snmp_exporter | Live | MikroTik hAP ax³ active target |
| node_exporter | Live | Native — VPS host metrics |
| Loki | Live | VPS01, 30-day retention |
| Grafana Alloy | Live | Shipping systemd journal + Docker logs to Loki |
| Restic | Partial | Backup targets exist — restore validation not yet complete |
| Business control plane | Direction reset | CRM, ticketing, PM selection reopened |
| FreeRADIUS captive portal | Not started | Phase 5 |
| FastAPI alert → ticket bridge | Not started | Depends on control-plane API design |
AI Stack
graph TD
subgraph Users["Operator Access"]
MAC_UI["Mac — browser tab\nor home screen web app"]
PHONE["iPhone — WireGuard VPN\nbrowser shortcut"]
TABLET["Android tablet\n(Phase Later)"]
end
subgraph OpenWebUI["Open WebUI (trillian / VLAN 60)"]
direction TB
OW["Open WebUI\nSelf-hosted Docker container\nCaddy reverse proxy"]
end
subgraph ModelBackends["Model Backends"]
OLLAMA["Ollama\nRTX 3060 / 12 GB VRAM\nPrivate + enterprise lanes\n7B–13B class models"]
CLAUDE_API["Claude API\nAnthropic Console account\nEnterprise + personal project lanes"]
MAC_OLLAMA["Mac Ollama\nApple M5 / 24 GB unified\nPortable private inference"]
end
MAC_UI --> OW
PHONE --> OW
TABLET -.-> OW
OW --> OLLAMA
OW --> CLAUDE_API
OW -.-> MAC_OLLAMA
style OW fill:#1a3a2a,color:#86efac
style OLLAMA fill:#1e3a5f,color:#93c5fd
style CLAUDE_API fill:#3b1f6e,color:#c4b5fd
style MAC_OLLAMA fill:#3f2a10,color:#fdba74
Model Routing Summary
| Use case | Model | Lane |
|---|---|---|
| Client data processing | Claude API | Enterprise |
| Business documentation | Claude API | Enterprise |
| Personal project work | Claude API or Gemini API | Personal projects |
| Private/personal queries | Local Ollama only | Private |
| Financial or health queries | Local Ollama only | Private — hard requirement |
AgentLab Orchestrator
The orchestrator is a separate coordination layer above Open WebUI. It routes work between Claude Code, Codex CLI, and Gemini CLI and manages the multi-agent session substrate.
Current status: Orchestrator Phase 1 substantially complete. Shelved for full deployment until Z640 rebuild is complete.
graph TD
subgraph Orchestrator["AgentLab Orchestrator (Therapon)"]
SUPER["Supervisor\nClaude — agent branch\norchestrator.py"]
PLAN["Planner\nTask decomposition"]
WORK_C["Worker — Claude Code"]
WORK_X["Worker — Codex CLI\ncodex branch / worktree"]
WORK_G["Worker — Gemini CLI\ngemini branch / worktree"]
RESEARCH["Researcher\n(active)"]
VERIFY["Verifier\n(stub — not wired)"]
PRIV["Private-lane worker\n(stub — pending)"]
end
OP["Operator\niTerm2 + orchestrator profile"]
OP -->|"Prompt via orchestrator profile"| SUPER
SUPER --> PLAN
PLAN --> WORK_C
PLAN --> WORK_X
PLAN --> WORK_G
PLAN --> RESEARCH
PLAN -.->|"pending"| VERIFY
PLAN -.->|"pending"| PRIV
style SUPER fill:#1a3a2a,color:#86efac
style VERIFY fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5
style PRIV fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5
Multi-Agent Git Model
| Branch | Owner | Purpose |
|---|---|---|
main |
Human only | Production — never commit directly |
agent |
Claude (Supervisor) | Claude's working branch |
codex |
Codex CLI | Codex's working branch |
gemini |
Gemini CLI | Gemini's working branch |
Human promotes agent → main via ./tools/promote.sh <tag> after review.
Chat and Voice Access
| Device | Interface | Notes |
|---|---|---|
| Mac | Open WebUI — browser tab or home screen web app | Primary |
| iPhone | Open WebUI — browser shortcut, WireGuard VPN on | VPN required |
| Android tablet | Open WebUI — Phase Later | Planned |
Voice Input
| Device | Tool | Status |
|---|---|---|
| Mac | SuperWhisper — local Parakeet model via WhisperKit, Apple Neural Engine | Active — all audio on-device |
| iPhone | Needs research — Apple Dictate is insufficient | Open |
Backup and Recovery
| Layer | Target | Status |
|---|---|---|
| Local Restic | VPS disk — fast-restore cache | Exists — restore validation not complete |
| Offsite | Cloudflare R2 Standard | Active — intended primary DR copy |
| Third target (3-2-1 completion) | Home lab (Z640/NUC) | Not active — blocked by Z640 rebuild |
True 3-2-1 is not yet complete. R2 is the current disaster-recovery copy. Restore path has not been validated end-to-end.
Open Architectural Decisions
| Topic | Summary | Status |
|---|---|---|
| WireGuard hub failover | CHR01 is a single hub. Three options (CHR02 same-provider, move to VPS01, split roles). No decision made. | Open |
| VPS02 warm standby | Manual failover only. Sync method undefined. | Open |
| Three-lane model enforcement | Policy concept — not technically enforced. Routing and retention boundaries undefined. | Open |
| Business control-plane architecture | CRM, ticketing, PM selection reopened. Odoo not a settled answer. GLPI in scope for ticketing. Target: dashboard over multiple best-fit tools, not a monolithic app. | Open |
| Cross-agent verification loop | One agent answers → second independently verifies → discrepancies surface before operator acts. | Not started |
Security Boundaries
| Boundary | Mechanism |
|---|---|
| All external service access | WireGuard VPN required — no public-facing admin interfaces |
| Secrets management | Ansible vault — never in git |
| Agent execution boundary | No agent controls production execution directly — human Terminal gate for all Ansible runs |
| Private lane data | Never routed to frontier APIs — local Ollama only |
| SSH keys | Mac host only — not mounted in container |