# Home Lab Build Plan — HP Z640 ## Hardware | Component | Detail | |---|---| | **System** | HP Z640 Workstation | | **CPU** | Intel Xeon (workstation class) | | **RAM** | 64 GB ECC | | **OS Storage** | 2× Samsung 850 EVO 500 GB — ZFS mirror (`rpool`) | | **Data Storage** | 4× Seagate 2 TB — ZFS RAIDZ2 encrypted (data pool) | | **GPU 1** | Intel Arc A310 (Sparkle) 4 GB — Jellyfin VA-API transcoding | | **GPU 2** | EVGA GeForce RTX 3060 XC 12 GB GDDR6 — Ollama local LLM inference | | **Current state** | Proxmox VE installed, organic/messy config — scheduled for clean rebuild | --- ## Phase Overview ```mermaid gantt title HP Z640 Rebuild — Phase Sequence dateFormat YYYY-MM-DD axisFormat Phase section Prerequisite Phase 0a — Pre-audit (SSH) :crit, p0a, 2026-04-01, 1d Phase 0b — USB backup :crit, p0b, after p0a, 1d section Core Build Phase 1 — Proxmox clean install :crit, p1, after p0b, 2d Phase 2 — Core infrastructure LXCs :p2, after p1, 2d section Services Phase 3 — Media stack :p3, after p2, 2d Phase 4a — Networking + security :p4a, after p3, 1d Phase 4b — Agent stack (trillian) :p4b, after p4a, 2d section Automation Phase 5 — IaC + automation :p5, after p4b, 3d ``` --- ## Phase 0a: Pre-Audit > **GATE — Nothing proceeds until this is complete.** Capture the current state of the Z640 before any destructive action. The rebuild will wipe LXC and VM configuration. **Scope:** - [ ] ZFS pool layout (`rpool` mirror + data pool RAIDZ2) — names, health, encryption status - [ ] VM and LXC inventory — all IDs, names, disk sizes, network config - [ ] Arr stack config and data paths (Sonarr, Radarr, Prowlarr, etc.) - [ ] Jellyfin config path and media library paths - [ ] PBS datastore paths and retention config - [ ] Network config — bridges, VLANs, IP assignments - [ ] Cron jobs — all scheduled tasks - [ ] Running services summary --- ## Phase 0b: USB Backup > **GATE — USB backup must complete before Phase 1. No exceptions.** Full backup of the ZFS data pool to external USB before any rebuild touches storage. - [ ] Attach external USB drive to Z640 - [ ] Verify USB drive capacity (must exceed used space on data pool) - [ ] Export pool snapshot and send to USB ```bash # Capture used space first zpool list zfs list # Send encrypted data pool to USB (adjust pool/dataset names from audit output) zfs snapshot datapool@pre-rebuild zfs send -R datapool@pre-rebuild | pv > /mnt/usb/datapool-pre-rebuild.zfs # Verify send completed without error echo "Exit code: $?" ``` --- ## Phase 1: Proxmox Clean Install > **GATE — Phase 0a audit complete. Phase 0b USB backup verified.** Fresh Proxmox VE install. Import existing ZFS pools. Establish baseline network config. - [ ] Download latest stable Proxmox VE ISO - [ ] Write ISO to USB installer - [ ] Boot Z640 from installer USB - [ ] Install Proxmox VE — **do not touch the data pool disks** - [ ] Import data pool: ```bash zpool import -f datapool zfs load-key datapool zfs mount -a ``` - [ ] Verify pool health: `zpool status && zfs list` ### Network Config VLAN scheme: `10.42.0.0/16` supernet. VLAN ID = third octet of the subnet. | VLAN ID | Subnet | Purpose | |---|---|---| | 10 | 10.42.10.0/24 | Management | | 20 | 10.42.20.0/24 | LAN / trusted devices | | 60 | 10.42.60.0/24 | AI-Agents | --- ## Phase 2: Core Infrastructure LXCs > **GATE — Proxmox clean install complete. ZFS pools healthy.** ### 2a — PBS LXC (Proxmox Backup Server) - [ ] Create LXC for PBS (unprivileged, Debian base) - [ ] Assign a datastore path on the data pool - [ ] Configure PBS retention policy - [ ] Register PBS in Proxmox - [ ] Test backup of a throwaway LXC ### 2b — WireGuard LXC - [ ] Create LXC for WireGuard - [ ] Install WireGuard - [ ] Configure as spoke to CHR01 ### 2c — Monitoring LXC - [ ] Create LXC for monitoring stack - [ ] Install Prometheus + Grafana - [ ] Add Proxmox node as scrape target - [ ] Basic dashboard: CPU, RAM, ZFS pool health, network --- ## Phase 3: Media Stack > **GATE — Phase 2 complete. ZFS data pool mounted and healthy.** ### 3a — Jellyfin LXC with Intel Arc A310 - [ ] Create LXC (privileged — required for GPU passthrough) - [ ] Pass through Intel Arc A310 via IOMMU / device passthrough - [ ] Install Jellyfin - [ ] Bind-mount media library paths from ZFS data pool - [ ] Configure VA-API hardware transcoding ```bash # Verify VA-API inside LXC vainfo # Expected: shows Intel iHD driver, H264/HEVC encode/decode profiles ``` ### 3b — Arr Stack LXCs or Docker - [ ] Determine migration target: individual LXCs or single Docker LXC - [ ] Restore arr config from paths captured in audit - [ ] Verify indexer connectivity (Prowlarr) - [ ] Verify download client connectivity - [ ] Verify library scan in Sonarr/Radarr against restored media paths --- ## Phase 4a: Networking + Security > **GATE — Media stack verified functional.** - [ ] All LXCs assigned to correct VLANs - [ ] Proxmox firewall rules: deny inter-VLAN by default, permit explicitly - [ ] VLAN 60 (AI-Agents) isolated — only permitted outbound: DNS, HTTPS, WireGuard tunnel - [ ] WireGuard tunnel to CHR01 confirmed up and passing traffic --- ## Phase 4b: Agent Stack — Open WebUI (LXC: `trillian`, VMID 112, VLAN 60) > **GATE — Phase 4a network config complete. VLAN 60 operational.** **Goal:** Deploy Open WebUI backed by Ollama on the RTX 3060. ### Architecture ```mermaid flowchart TD User["User (VPN connected)"] VPS01["VPS01\nCaddy reverse proxy\ntherapon.yourdomain.com"] WG["WireGuard tunnel\nCHR01 ↔ trillian"] Caddy["Caddy (trillian LXC)\nInternal reverse proxy"] WebUI["Open WebUI\nDocker container"] Ollama["Ollama\nDocker container"] GPU["RTX 3060 XC 12 GB\nGPU passthrough"] User --> VPS01 VPS01 --> WG WG --> Caddy Caddy --> WebUI WebUI --> Ollama Ollama --> GPU ``` ### Tasks - [ ] Create privileged LXC `trillian` — VMID 112, VLAN 60, Debian 12 - [ ] Pass through EVGA RTX 3060 via IOMMU - [ ] Install Docker inside LXC - [ ] Verify GPU visible inside LXC: `nvidia-smi` - [ ] Deploy Ollama container with GPU passthrough - [ ] Deploy Open WebUI container - [ ] Configure Caddy reverse proxy - [ ] Test end-to-end: VPN on, browser to internal URL, model inference working --- ## Phase 5: IaC + Automation > **GATE — Full stack deployed and verified functional.** - [ ] Configure Terraform Proxmox provider (`bpg/proxmox`) - [ ] Write Terraform modules for LXC and VM templates - [ ] Import existing LXCs into Terraform state - [ ] Write Ansible playbooks for LXC configuration - [ ] Deploy HashiCorp Vault LXC - [ ] Migrate secrets from manual config to Vault --- ## Future Considerations (Not in Scope) | Item | Notes | |------|-------| | UPS (APC or similar) | Worthwhile — deferred beyond Phase 5 | | Second NIC for dedicated storage network | Optional optimisation | | GPU upgrade beyond RTX 3060 | Not needed at current model sizes |