Architecture, overview, homelab build plan, agent handbook, ADRs, and agent operating rules. All sensitive operational details sanitized (real IPs, hostnames, client names replaced with generic placeholders). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7 KiB
Home Lab Build Plan — HP Z640
Hardware
| Component | Detail |
|---|---|
| System | HP Z640 Workstation |
| CPU | Intel Xeon (workstation class) |
| RAM | 64 GB ECC |
| OS Storage | 2× Samsung 850 EVO 500 GB — ZFS mirror (rpool) |
| Data Storage | 4× Seagate 2 TB — ZFS RAIDZ2 encrypted (data pool) |
| GPU 1 | Intel Arc A310 (Sparkle) 4 GB — Jellyfin VA-API transcoding |
| GPU 2 | EVGA GeForce RTX 3060 XC 12 GB GDDR6 — Ollama local LLM inference |
| Current state | Proxmox VE installed, organic/messy config — scheduled for clean rebuild |
Phase Overview
gantt
title HP Z640 Rebuild — Phase Sequence
dateFormat YYYY-MM-DD
axisFormat Phase
section Prerequisite
Phase 0a — Pre-audit (SSH) :crit, p0a, 2026-04-01, 1d
Phase 0b — USB backup :crit, p0b, after p0a, 1d
section Core Build
Phase 1 — Proxmox clean install :crit, p1, after p0b, 2d
Phase 2 — Core infrastructure LXCs :p2, after p1, 2d
section Services
Phase 3 — Media stack :p3, after p2, 2d
Phase 4a — Networking + security :p4a, after p3, 1d
Phase 4b — Agent stack (trillian) :p4b, after p4a, 2d
section Automation
Phase 5 — IaC + automation :p5, after p4b, 3d
Phase 0a: Pre-Audit
GATE — Nothing proceeds until this is complete.
Capture the current state of the Z640 before any destructive action. The rebuild will wipe LXC and VM configuration.
Scope:
- ZFS pool layout (
rpoolmirror + data pool RAIDZ2) — names, health, encryption status - VM and LXC inventory — all IDs, names, disk sizes, network config
- Arr stack config and data paths (Sonarr, Radarr, Prowlarr, etc.)
- Jellyfin config path and media library paths
- PBS datastore paths and retention config
- Network config — bridges, VLANs, IP assignments
- Cron jobs — all scheduled tasks
- Running services summary
Phase 0b: USB Backup
GATE — USB backup must complete before Phase 1. No exceptions.
Full backup of the ZFS data pool to external USB before any rebuild touches storage.
- Attach external USB drive to Z640
- Verify USB drive capacity (must exceed used space on data pool)
- Export pool snapshot and send to USB
# Capture used space first
zpool list
zfs list
# Send encrypted data pool to USB (adjust pool/dataset names from audit output)
zfs snapshot datapool@pre-rebuild
zfs send -R datapool@pre-rebuild | pv > /mnt/usb/datapool-pre-rebuild.zfs
# Verify send completed without error
echo "Exit code: $?"
Phase 1: Proxmox Clean Install
GATE — Phase 0a audit complete. Phase 0b USB backup verified.
Fresh Proxmox VE install. Import existing ZFS pools. Establish baseline network config.
- Download latest stable Proxmox VE ISO
- Write ISO to USB installer
- Boot Z640 from installer USB
- Install Proxmox VE — do not touch the data pool disks
- Import data pool:
zpool import -f datapool
zfs load-key datapool
zfs mount -a
- Verify pool health:
zpool status && zfs list
Network Config
VLAN scheme: 10.42.0.0/16 supernet. VLAN ID = third octet of the subnet.
| VLAN ID | Subnet | Purpose |
|---|---|---|
| 10 | 10.42.10.0/24 | Management |
| 20 | 10.42.20.0/24 | LAN / trusted devices |
| 60 | 10.42.60.0/24 | AI-Agents |
Phase 2: Core Infrastructure LXCs
GATE — Proxmox clean install complete. ZFS pools healthy.
2a — PBS LXC (Proxmox Backup Server)
- Create LXC for PBS (unprivileged, Debian base)
- Assign a datastore path on the data pool
- Configure PBS retention policy
- Register PBS in Proxmox
- Test backup of a throwaway LXC
2b — WireGuard LXC
- Create LXC for WireGuard
- Install WireGuard
- Configure as spoke to CHR01
2c — Monitoring LXC
- Create LXC for monitoring stack
- Install Prometheus + Grafana
- Add Proxmox node as scrape target
- Basic dashboard: CPU, RAM, ZFS pool health, network
Phase 3: Media Stack
GATE — Phase 2 complete. ZFS data pool mounted and healthy.
3a — Jellyfin LXC with Intel Arc A310
- Create LXC (privileged — required for GPU passthrough)
- Pass through Intel Arc A310 via IOMMU / device passthrough
- Install Jellyfin
- Bind-mount media library paths from ZFS data pool
- Configure VA-API hardware transcoding
# Verify VA-API inside LXC
vainfo
# Expected: shows Intel iHD driver, H264/HEVC encode/decode profiles
3b — Arr Stack LXCs or Docker
- Determine migration target: individual LXCs or single Docker LXC
- Restore arr config from paths captured in audit
- Verify indexer connectivity (Prowlarr)
- Verify download client connectivity
- Verify library scan in Sonarr/Radarr against restored media paths
Phase 4a: Networking + Security
GATE — Media stack verified functional.
- All LXCs assigned to correct VLANs
- Proxmox firewall rules: deny inter-VLAN by default, permit explicitly
- VLAN 60 (AI-Agents) isolated — only permitted outbound: DNS, HTTPS, WireGuard tunnel
- WireGuard tunnel to CHR01 confirmed up and passing traffic
Phase 4b: Agent Stack — Open WebUI (LXC: trillian, VMID 112, VLAN 60)
GATE — Phase 4a network config complete. VLAN 60 operational.
Goal: Deploy Open WebUI backed by Ollama on the RTX 3060.
Architecture
flowchart TD
User["User (VPN connected)"]
VPS01["VPS01\nCaddy reverse proxy\ntherapon.yourdomain.com"]
WG["WireGuard tunnel\nCHR01 ↔ trillian"]
Caddy["Caddy (trillian LXC)\nInternal reverse proxy"]
WebUI["Open WebUI\nDocker container"]
Ollama["Ollama\nDocker container"]
GPU["RTX 3060 XC 12 GB\nGPU passthrough"]
User --> VPS01
VPS01 --> WG
WG --> Caddy
Caddy --> WebUI
WebUI --> Ollama
Ollama --> GPU
Tasks
- Create privileged LXC
trillian— VMID 112, VLAN 60, Debian 12 - Pass through EVGA RTX 3060 via IOMMU
- Install Docker inside LXC
- Verify GPU visible inside LXC:
nvidia-smi - Deploy Ollama container with GPU passthrough
- Deploy Open WebUI container
- Configure Caddy reverse proxy
- Test end-to-end: VPN on, browser to internal URL, model inference working
Phase 5: IaC + Automation
GATE — Full stack deployed and verified functional.
- Configure Terraform Proxmox provider (
bpg/proxmox) - Write Terraform modules for LXC and VM templates
- Import existing LXCs into Terraform state
- Write Ansible playbooks for LXC configuration
- Deploy HashiCorp Vault LXC
- Migrate secrets from manual config to Vault
Future Considerations (Not in Scope)
| Item | Notes |
|---|---|
| UPS (APC or similar) | Worthwhile — deferred beyond Phase 5 |
| Second NIC for dedicated storage network | Optional optimisation |
| GPU upgrade beyond RTX 3060 | Not needed at current model sizes |