# AgentLab System Architecture > **Scope:** Full system map — hardware, networking, AI stack, three-lane operating model, > and open architectural decisions. --- ## Contents - [Hardware Inventory](#hardware-inventory) - [Three-Lane Operating Model](#three-lane-operating-model) - [Network Topology](#network-topology) - [VLAN Scheme](#vlan-scheme) - [VPS Platform Services](#vps-platform-services) - [AI Stack](#ai-stack) - [AgentLab Orchestrator](#agentlab-orchestrator) - [Chat and Voice Access](#chat-and-voice-access) - [Backup and Recovery](#backup-and-recovery) - [Open Architectural Decisions](#open-architectural-decisions) --- ## Hardware Inventory | Node | Type | Status | Role | |------|------|--------|------| | MacBook Pro M5 24 GB | Operator workstation | Active | Primary operator machine, Claude Code devcontainer, local Ollama (privacy), portable to client sites | | HP Z640 64 GB RAM | Home server | Needs rebuild | Proxmox VE host — agent stack, media server, persistent Ollama inference (RTX 3060) | | Intel NUC (home office) | Small server | Active | Proxmox Backup Server, VPS recovery testing lab | | CHR01 | MikroTik Cloud Hosted Router | Active | WireGuard hub (all home lab spokes), SSTP hub for client MikroTik routers | | CHR02 | MikroTik Cloud Hosted Router | Planned | Failover hub — design undefined | | VPS01 | VPS | Active | Forgejo, Caddy reverse proxy, monitoring stack, WireGuard spoke | | VPS02 | VPS | Partial | Warm standby for VPS01 — manual failover only, sync method undefined | | Client routers | MikroTik RouterOS v6 | Active | SSTP tunnels to CHR01 ([MSP Client A] — multiple sites) | | Android tablet | Mobile | Phase Later | Dashboard, demo device | ### MacBook Pro M5 — Workstation Baseline | Item | Value | |------|-------| | Chip | Apple M5 | | CPU cores | 10 total — 4 performance, 6 efficiency | | Unified memory | 24 GB | ### HP Z640 — GPU Inventory | GPU | VRAM | Role | |-----|------|------| | Intel Arc A310 (Sparkle) | 4 GB | Jellyfin VA-API transcoding | | EVGA GeForce RTX 3060 XC 12 GB GDDR6 | 12 GB | Ollama local LLM inference | ### Intel NUC — Home Office Node | Item | Value | |------|-------| | CPU | Intel i7 10th gen (~10 cores) | | RAM | 8 GB (upgradeable) | | Storage | 500 GB SSD + 2× 2 TB SSDs | | Role | Proxmox Backup Server + VPS recovery lab | ### Proxmox Node Naming (Hitchhiker's Guide Theme) Existing nodes: `deep-thought`, `ford`, `marvin`, `zaphod`, `slartibartfast`, `hactar`, `magrathea` Planned: `trillian` (VMID 112) — Open WebUI + Ollama LXC on VLAN 60 --- ## Three-Lane Operating Model All three lanes require access to a local model. Sensitive or private data must never leave the device. | Lane | Purpose | Primary devices | Frontier API allowed | Local model requirement | |------|---------|----------------|----------------------|------------------------| | **Enterprise / MSP** | ScreamSaver IT + AI-native MSP brand. Client data cleanup, CSV processing, password review, business docs | Mac + Z640 | Claude (Anthropic API) | Required — sensitive client data | | **Personal projects** | Creator work | Mac + Z640 | Claude + Gemini | Required — privacy | | **Private / personal** | Private inquiries, financial, health | Mac + Z640 | None | Hard requirement — no frontier API | **Practical implementation:** Open WebUI (self-hosted) with per-conversation model switching. Local Ollama serves the private lane. Claude API serves enterprise and personal project lanes. ```mermaid graph TD subgraph Lanes["Three-Lane Operating Model"] direction TB L1["Enterprise / MSP Lane\nClient data · CSV · business docs"] L2["Personal Projects Lane\nCreator work · experiments"] L3["Private / Personal Lane\nFinancial · health · personal"] end subgraph Models["Model Routing"] FM1["Claude API\n(Anthropic Console)"] FM2["Gemini API"] LM["Local Ollama\nRTX 3060 / Mac Apple Silicon"] end L1 -->|"Allowed"| FM1 L2 -->|"Allowed"| FM1 L2 -->|"Allowed"| FM2 L3 -->|"Hard block — no frontier"| LM L1 -->|"Required"| LM L2 -->|"Required"| LM style L3 fill:#7f1d1d,color:#fca5a5 style LM fill:#1e3a5f,color:#93c5fd ``` > **Open decision — OL-THREE-LANE-FORMALISE:** The three-lane model is a policy > concept. It is not yet technically enforced. Routing, retention boundaries, and migration > to permanent hardware are still undefined. --- ## Network Topology ### WireGuard Hub-and-Spoke **Hub:** CHR01 (MikroTik Cloud Hosted Router) **Spokes:** VPS01, Mac workstation, Z640 (VLAN 60), NUC, client office connections **SSTP tunnels:** Client MikroTik routers (RouterOS v6) connect via SSTP to CHR01. This enables Winbox/SSH access to client routers and WireGuard-forwarded connectivity for client site access (e.g. POS RDP paths). ```mermaid graph TB subgraph Cloud["Cloud / VPS Layer"] CHR01["CHR01\nMikroTik Cloud Hosted Router\nWireGuard hub · SSTP hub"] VPS01["VPS01\nvps01.yourdomain.com\nForgejo · Caddy · Monitoring\nwg0: 10.0.12.20\nwg1: 10.33.33.1"] CHR02["CHR02\n(Planned failover hub)"] end subgraph HomeLab["Home Lab"] Z640["HP Z640\nProxmox VE\nOllama · RTX 3060\nVLAN 60 / 10.42.60.x"] NUC["Intel NUC\nProxmox Backup Server\nVPS recovery lab"] end subgraph Operator["Operator"] MAC["MacBook Pro M5\nClaude Code devcontainer\nLocal Ollama (privacy)"] end subgraph Clients["Client Sites (SSTP)"] CR1["[MSP Client A]\nmultiple sites\nMikroTik RouterOS v6"] CRN["Other client routers\n(MikroTik RouterOS v6)"] end CHR01 <-->|"WireGuard spoke"| VPS01 CHR01 <-->|"WireGuard spoke"| Z640 CHR01 <-->|"WireGuard spoke"| NUC CHR01 <-->|"WireGuard spoke"| MAC CHR01 <-->|"SSTP tunnel"| CR1 CHR01 <-->|"SSTP tunnel"| CRN CHR01 -.->|"Planned failover"| CHR02 style CHR01 fill:#1a3a2a,color:#86efac style CHR02 fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5 style VPS01 fill:#1e3a5f,color:#93c5fd ``` > **Open decision — OL-CHR-FAILOVER:** CHR01 is the single WireGuard and SSTP hub. > Three failover options under consideration: > > 1. Keep CHR01 + add CHR02 failover (same cloud provider risk remains) > 2. Move all tunnels to VPS01 (VPS01 becomes single point of failure for both services and tunnels) > 3. **Split roles:** CHR01 handles client SSTP, VPS01 handles home lab WireGuard ### VPS Dual-Plane WireGuard Design VPS01 runs two WireGuard interfaces: | Interface | Subnet | Purpose | |-----------|--------|---------| | `wg0` | 10.0.12.x | Infrastructure lane — router reachability, SNMP polling, inter-site monitoring | | `wg1` | 10.33.33.x | Operator/client access — private access to platform services, internal DNS | Split DNS via Unbound on `wg1`: all `*.yourdomain.com` resolves over WireGuard. --- ## VLAN Scheme **Supernet:** `10.42.0.0/16` **Convention:** VLAN ID = third octet. VLAN 60 = `10.42.60.0/24`. | VLAN | Name | Subnet | Key hosts | |------|------|--------|-----------| | 60 | AI-Agents | 10.42.60.0/24 | `trillian` (VMID 112) — Open WebUI + Ollama LXC | --- ## VPS Platform Services | Service | Status | Notes | |---------|--------|-------| | Caddy | Live | All vhosts deployed | | WireGuard | Live | Dual-plane: wg0 infrastructure, wg1 operator/client | | Forgejo | Live | `git.yourdomain.com` | | Prometheus | Live | Scraping node + SNMP targets | | Grafana | Live | `monitoring.yourdomain.com` | | Alertmanager | Live | `alerts.yourdomain.com` — email pipeline confirmed (DKIM/SPF/DMARC pass) | | snmp_exporter | Live | MikroTik hAP ax³ active target | | node_exporter | Live | Native — VPS host metrics | | Loki | Live | VPS01, 30-day retention | | Grafana Alloy | Live | Shipping systemd journal + Docker logs to Loki | | Restic | Partial | Backup targets exist — restore validation not yet complete | | Business control plane | Direction reset | CRM, ticketing, PM selection reopened | | FreeRADIUS captive portal | Not started | Phase 5 | | FastAPI alert → ticket bridge | Not started | Depends on control-plane API design | --- ## AI Stack ```mermaid graph TD subgraph Users["Operator Access"] MAC_UI["Mac — browser tab\nor home screen web app"] PHONE["iPhone — WireGuard VPN\nbrowser shortcut"] TABLET["Android tablet\n(Phase Later)"] end subgraph OpenWebUI["Open WebUI (trillian / VLAN 60)"] direction TB OW["Open WebUI\nSelf-hosted Docker container\nCaddy reverse proxy"] end subgraph ModelBackends["Model Backends"] OLLAMA["Ollama\nRTX 3060 / 12 GB VRAM\nPrivate + enterprise lanes\n7B–13B class models"] CLAUDE_API["Claude API\nAnthropic Console account\nEnterprise + personal project lanes"] MAC_OLLAMA["Mac Ollama\nApple M5 / 24 GB unified\nPortable private inference"] end MAC_UI --> OW PHONE --> OW TABLET -.-> OW OW --> OLLAMA OW --> CLAUDE_API OW -.-> MAC_OLLAMA style OW fill:#1a3a2a,color:#86efac style OLLAMA fill:#1e3a5f,color:#93c5fd style CLAUDE_API fill:#3b1f6e,color:#c4b5fd style MAC_OLLAMA fill:#3f2a10,color:#fdba74 ``` ### Model Routing Summary | Use case | Model | Lane | |----------|-------|------| | Client data processing | Claude API | Enterprise | | Business documentation | Claude API | Enterprise | | Personal project work | Claude API or Gemini API | Personal projects | | Private/personal queries | Local Ollama only | Private | | Financial or health queries | Local Ollama only | Private — hard requirement | --- ## AgentLab Orchestrator The orchestrator is a separate coordination layer above Open WebUI. It routes work between Claude Code, Codex CLI, and Gemini CLI and manages the multi-agent session substrate. > **Current status:** Orchestrator Phase 1 substantially complete. Shelved for full deployment until Z640 rebuild is complete. ```mermaid graph TD subgraph Orchestrator["AgentLab Orchestrator (Therapon)"] SUPER["Supervisor\nClaude — agent branch\norchestrator.py"] PLAN["Planner\nTask decomposition"] WORK_C["Worker — Claude Code"] WORK_X["Worker — Codex CLI\ncodex branch / worktree"] WORK_G["Worker — Gemini CLI\ngemini branch / worktree"] RESEARCH["Researcher\n(active)"] VERIFY["Verifier\n(stub — not wired)"] PRIV["Private-lane worker\n(stub — pending)"] end OP["Operator\niTerm2 + orchestrator profile"] OP -->|"Prompt via orchestrator profile"| SUPER SUPER --> PLAN PLAN --> WORK_C PLAN --> WORK_X PLAN --> WORK_G PLAN --> RESEARCH PLAN -.->|"pending"| VERIFY PLAN -.->|"pending"| PRIV style SUPER fill:#1a3a2a,color:#86efac style VERIFY fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5 style PRIV fill:#3f2a10,color:#fdba74,stroke-dasharray: 5 5 ``` ### Multi-Agent Git Model | Branch | Owner | Purpose | |--------|-------|---------| | `main` | Human only | Production — never commit directly | | `agent` | Claude (Supervisor) | Claude's working branch | | `codex` | Codex CLI | Codex's working branch | | `gemini` | Gemini CLI | Gemini's working branch | Human promotes `agent` → `main` via `./tools/promote.sh ` after review. --- ## Chat and Voice Access | Device | Interface | Notes | |--------|-----------|-------| | Mac | Open WebUI — browser tab or home screen web app | Primary | | iPhone | Open WebUI — browser shortcut, WireGuard VPN on | VPN required | | Android tablet | Open WebUI — Phase Later | Planned | ### Voice Input | Device | Tool | Status | |--------|------|--------| | Mac | SuperWhisper — local Parakeet model via WhisperKit, Apple Neural Engine | Active — all audio on-device | | iPhone | Needs research — Apple Dictate is insufficient | Open | --- ## Backup and Recovery | Layer | Target | Status | |-------|--------|--------| | Local Restic | VPS disk — fast-restore cache | Exists — restore validation not complete | | Offsite | Cloudflare R2 Standard | Active — intended primary DR copy | | Third target (3-2-1 completion) | Home lab (Z640/NUC) | Not active — blocked by Z640 rebuild | > **True 3-2-1 is not yet complete.** R2 is the current disaster-recovery copy. > Restore path has not been validated end-to-end. --- ## Open Architectural Decisions | Topic | Summary | Status | |-------|---------|--------| | WireGuard hub failover | CHR01 is a single hub. Three options (CHR02 same-provider, move to VPS01, split roles). No decision made. | Open | | VPS02 warm standby | Manual failover only. Sync method undefined. | Open | | Three-lane model enforcement | Policy concept — not technically enforced. Routing and retention boundaries undefined. | Open | | Business control-plane architecture | CRM, ticketing, PM selection reopened. Odoo not a settled answer. GLPI in scope for ticketing. Target: dashboard over multiple best-fit tools, not a monolithic app. | Open | | Cross-agent verification loop | One agent answers → second independently verifies → discrepancies surface before operator acts. | Not started | --- ## Security Boundaries | Boundary | Mechanism | |----------|-----------| | All external service access | WireGuard VPN required — no public-facing admin interfaces | | Secrets management | Ansible vault — never in git | | Agent execution boundary | No agent controls production execution directly — human Terminal gate for all Ansible runs | | Private lane data | Never routed to frontier APIs — local Ollama only | | SSH keys | Mac host only — not mounted in container |