portfolio/agentlab/homelab-build-plan.md
AgentLab d5ef629a54 feat: initial AgentLab portfolio content
Architecture, overview, homelab build plan, agent handbook, ADRs,
and agent operating rules. All sensitive operational details sanitized
(real IPs, hostnames, client names replaced with generic placeholders).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 04:52:42 +00:00

7 KiB
Raw Blame History

Home Lab Build Plan — HP Z640

Hardware

Component Detail
System HP Z640 Workstation
CPU Intel Xeon (workstation class)
RAM 64 GB ECC
OS Storage 2× Samsung 850 EVO 500 GB — ZFS mirror (rpool)
Data Storage 4× Seagate 2 TB — ZFS RAIDZ2 encrypted (data pool)
GPU 1 Intel Arc A310 (Sparkle) 4 GB — Jellyfin VA-API transcoding
GPU 2 EVGA GeForce RTX 3060 XC 12 GB GDDR6 — Ollama local LLM inference
Current state Proxmox VE installed, organic/messy config — scheduled for clean rebuild

Phase Overview

gantt
    title HP Z640 Rebuild — Phase Sequence
    dateFormat  YYYY-MM-DD
    axisFormat  Phase

    section Prerequisite
    Phase 0a — Pre-audit (SSH)          :crit, p0a, 2026-04-01, 1d
    Phase 0b — USB backup               :crit, p0b, after p0a, 1d

    section Core Build
    Phase 1 — Proxmox clean install     :crit, p1, after p0b, 2d
    Phase 2 — Core infrastructure LXCs  :p2, after p1, 2d

    section Services
    Phase 3 — Media stack               :p3, after p2, 2d
    Phase 4a — Networking + security    :p4a, after p3, 1d
    Phase 4b — Agent stack (trillian)   :p4b, after p4a, 2d

    section Automation
    Phase 5 — IaC + automation          :p5, after p4b, 3d

Phase 0a: Pre-Audit

GATE — Nothing proceeds until this is complete.

Capture the current state of the Z640 before any destructive action. The rebuild will wipe LXC and VM configuration.

Scope:

  • ZFS pool layout (rpool mirror + data pool RAIDZ2) — names, health, encryption status
  • VM and LXC inventory — all IDs, names, disk sizes, network config
  • Arr stack config and data paths (Sonarr, Radarr, Prowlarr, etc.)
  • Jellyfin config path and media library paths
  • PBS datastore paths and retention config
  • Network config — bridges, VLANs, IP assignments
  • Cron jobs — all scheduled tasks
  • Running services summary

Phase 0b: USB Backup

GATE — USB backup must complete before Phase 1. No exceptions.

Full backup of the ZFS data pool to external USB before any rebuild touches storage.

  • Attach external USB drive to Z640
  • Verify USB drive capacity (must exceed used space on data pool)
  • Export pool snapshot and send to USB
# Capture used space first
zpool list
zfs list

# Send encrypted data pool to USB (adjust pool/dataset names from audit output)
zfs snapshot datapool@pre-rebuild
zfs send -R datapool@pre-rebuild | pv > /mnt/usb/datapool-pre-rebuild.zfs

# Verify send completed without error
echo "Exit code: $?"

Phase 1: Proxmox Clean Install

GATE — Phase 0a audit complete. Phase 0b USB backup verified.

Fresh Proxmox VE install. Import existing ZFS pools. Establish baseline network config.

  • Download latest stable Proxmox VE ISO
  • Write ISO to USB installer
  • Boot Z640 from installer USB
  • Install Proxmox VE — do not touch the data pool disks
  • Import data pool:
zpool import -f datapool
zfs load-key datapool
zfs mount -a
  • Verify pool health: zpool status && zfs list

Network Config

VLAN scheme: 10.42.0.0/16 supernet. VLAN ID = third octet of the subnet.

VLAN ID Subnet Purpose
10 10.42.10.0/24 Management
20 10.42.20.0/24 LAN / trusted devices
60 10.42.60.0/24 AI-Agents

Phase 2: Core Infrastructure LXCs

GATE — Proxmox clean install complete. ZFS pools healthy.

2a — PBS LXC (Proxmox Backup Server)

  • Create LXC for PBS (unprivileged, Debian base)
  • Assign a datastore path on the data pool
  • Configure PBS retention policy
  • Register PBS in Proxmox
  • Test backup of a throwaway LXC

2b — WireGuard LXC

  • Create LXC for WireGuard
  • Install WireGuard
  • Configure as spoke to CHR01

2c — Monitoring LXC

  • Create LXC for monitoring stack
  • Install Prometheus + Grafana
  • Add Proxmox node as scrape target
  • Basic dashboard: CPU, RAM, ZFS pool health, network

Phase 3: Media Stack

GATE — Phase 2 complete. ZFS data pool mounted and healthy.

3a — Jellyfin LXC with Intel Arc A310

  • Create LXC (privileged — required for GPU passthrough)
  • Pass through Intel Arc A310 via IOMMU / device passthrough
  • Install Jellyfin
  • Bind-mount media library paths from ZFS data pool
  • Configure VA-API hardware transcoding
# Verify VA-API inside LXC
vainfo
# Expected: shows Intel iHD driver, H264/HEVC encode/decode profiles

3b — Arr Stack LXCs or Docker

  • Determine migration target: individual LXCs or single Docker LXC
  • Restore arr config from paths captured in audit
  • Verify indexer connectivity (Prowlarr)
  • Verify download client connectivity
  • Verify library scan in Sonarr/Radarr against restored media paths

Phase 4a: Networking + Security

GATE — Media stack verified functional.

  • All LXCs assigned to correct VLANs
  • Proxmox firewall rules: deny inter-VLAN by default, permit explicitly
  • VLAN 60 (AI-Agents) isolated — only permitted outbound: DNS, HTTPS, WireGuard tunnel
  • WireGuard tunnel to CHR01 confirmed up and passing traffic

Phase 4b: Agent Stack — Open WebUI (LXC: trillian, VMID 112, VLAN 60)

GATE — Phase 4a network config complete. VLAN 60 operational.

Goal: Deploy Open WebUI backed by Ollama on the RTX 3060.

Architecture

flowchart TD
    User["User (VPN connected)"]
    VPS01["VPS01\nCaddy reverse proxy\ntherapon.yourdomain.com"]
    WG["WireGuard tunnel\nCHR01 ↔ trillian"]
    Caddy["Caddy (trillian LXC)\nInternal reverse proxy"]
    WebUI["Open WebUI\nDocker container"]
    Ollama["Ollama\nDocker container"]
    GPU["RTX 3060 XC 12 GB\nGPU passthrough"]

    User --> VPS01
    VPS01 --> WG
    WG --> Caddy
    Caddy --> WebUI
    WebUI --> Ollama
    Ollama --> GPU

Tasks

  • Create privileged LXC trillian — VMID 112, VLAN 60, Debian 12
  • Pass through EVGA RTX 3060 via IOMMU
  • Install Docker inside LXC
  • Verify GPU visible inside LXC: nvidia-smi
  • Deploy Ollama container with GPU passthrough
  • Deploy Open WebUI container
  • Configure Caddy reverse proxy
  • Test end-to-end: VPN on, browser to internal URL, model inference working

Phase 5: IaC + Automation

GATE — Full stack deployed and verified functional.

  • Configure Terraform Proxmox provider (bpg/proxmox)
  • Write Terraform modules for LXC and VM templates
  • Import existing LXCs into Terraform state
  • Write Ansible playbooks for LXC configuration
  • Deploy HashiCorp Vault LXC
  • Migrate secrets from manual config to Vault

Future Considerations (Not in Scope)

Item Notes
UPS (APC or similar) Worthwhile — deferred beyond Phase 5
Second NIC for dedicated storage network Optional optimisation
GPU upgrade beyond RTX 3060 Not needed at current model sizes