# AgentLab Overview ## The Plain Version This is a private AI-assisted control plane — a system that lets one person run a small IT business, manage infrastructure, and handle projects with AI agents doing the heavy lifting inside a safe, bounded workspace. The simplest analogy: imagine a workshop where the bench is a server, the notebook is a Git repository, and instead of one assistant, there are three AI builders (Claude, Codex, and Gemini) who each read the same notebook before they start work and write everything they do back into it. The notebook is the memory — not the chat history, not anyone's head. **What's actually running right now:** - A VPS in the cloud running a proper monitoring stack — Prometheus, Grafana, Alertmanager — that watches network equipment and sends email alerts when things break. - WireGuard VPN connecting the private services so they're not exposed to the public internet. - Forgejo (a private Git server) running at `git.yourdomain.com` — the version control system for all the code and config. - Loki + Grafana Alloy collecting logs from the server (system logs, Docker container logs) — live but still being made useful. The data is flowing; the analysis layer is the next step. - The AI workspace itself: Claude, Codex, and Gemini running inside a Docker container, all reading from and writing to the same Git repo. Each one has its own rules, its own branch, and its own lane of work. **What's coming next:** - A proper orchestrator that routes tasks to the right AI agent automatically and knows when each one has run out of budget or hit its usage window. - Better log-based monitoring — turning the Loki data into something you can actually act on: search logs from routers, correlate network events, alert on log patterns. - Mobile/phone access — being able to check on the system or give it a task from your phone, not just from a laptop. - A private dashboard that will eventually be the single operator view over infrastructure, clients, and projects. - Business platform layer — ticketing, CRM, and project management tools, to be decided. The philosophy is: one tight control surface over multiple best-fit tools, not one giant app trying to do everything. **The bigger picture:** The name for this overall direction is Therapon. The idea is an operator-owned system — not dependent on any single vendor, not locked into a subscription SaaS stack — that a small team or solo operator can actually run and trust. The AI agents are the workers. The repo is the brain. The infrastructure is owned, not rented from someone else's control plane. It's early. The foundation is real and working. The next chapter is turning a collection of good parts into a coherent, orchestrated system. --- ## What This System Is (Technical Detail) AgentLab is a way to use AI coding agents inside a bounded development workspace instead of letting them operate directly on a host machine. In this repo, that workspace is a Docker devcontainer running Ubuntu, with the main Git repository mounted into it at `/workspace/vault-repo`. Claude Code, Codex, and Gemini run inside that container and work against the repo. The important idea is not "use VS Code." VS Code is only one possible client for attaching to the container. You can also enter the container with `docker exec`, a terminal profile, JetBrains remote tooling, or any other workflow you prefer. The repo acts as the control plane. Instructions, rules, state documents, handoff notes, plans, scripts, and generated outputs all live in Git. The agents do not rely on chat history as the source of truth. They read the repo, operate inside the container, and write changes back to the repo. This is a practical middle ground between two weaker models: - letting an agent run loosely on a laptop with unclear boundaries - treating chat transcripts as the project memory Instead, the durable memory is the repository, and the execution boundary is the container. ## Why It Works ### Persistence The repo itself is bind-mounted from the host into the container, so the working tree survives container rebuilds. Tool authentication and per-agent config are stored in Docker named volumes rather than in the writable container layer: - `claude-config` mounted at `~/.claude` - `codex-config` mounted at `~/.codex` - `gemini-config` mounted at `~/.gemini` That means the agent binaries can be replaced or the container can be rebuilt without losing agent login state and settings. The repo also codifies an important lesson in [decisions/ADR-001-tool-persistence.md](decisions/ADR-001-tool-persistence.md): persisting auth is not enough. Each tool also needs a bootstrap install step so the binary is present after rebuilds. ### Isolation The container is a bounded workspace. The agents can see what is mounted into `/workspace`, not the whole host. Secrets are intentionally kept outside the repo. That gives you a cleaner security and operational model: - the host stays thin - the repo is the shared memory - the container is the execution environment - production or infrastructure actions can stay behind a human approval gate ### Rules and Required Reading AgentLab relies on documented agent rules rather than trusting the model to "just behave." `AGENTS.md` defines cross-agent rules for Claude, Codex, and Gemini, and `LLM-ENTRYPOINT.md` defines the mandatory read order and execution boundaries. The required reading order matters. It forces each session to rehydrate itself from the same durable context before acting. That reduces drift between sessions and between different agents. The pattern: 1. Start from a single repo entrypoint document. 2. Read the current system plan and open loops. 3. Read current operational state and handoff docs. 4. Only then begin work. ### Hooks and Guardrails For Claude Code specifically, this repo uses hook scripts and settings under `.claude/` to block dangerous operations and remind the agent about boundaries. The concept is portable: combine agent instructions with mechanical guardrails where the tool supports them. ### Bootstrap and Image Design Two layers to the workspace: - the image layer, defined by the Dockerfile - the runtime repair layer, defined by `bootstrap.sh` The Dockerfile bakes in the base OS packages and the agent CLIs. `bootstrap.sh` provides an idempotent way to verify or restore what must exist in the runtime environment. That combination makes rebuilds predictable and recoverable. ## How To Replicate It on Windows or Linux The portable version of this setup does not require macOS, Forgejo, or VS Code. ### 1. A Single Bounded Workspace Container Use Docker Desktop on Windows or standard Docker Engine on Linux. Build one long-lived Ubuntu-based development container that sleeps when idle and serves as the workspace for human and agent sessions. You can use a devcontainer definition or plain `docker compose` plus `docker exec`. VS Code with the Dev Containers extension is a common option, but it is optional. ### 2. A Git Repo as the Control Plane Put your instructions, plans, handoffs, task notes, and helper scripts in the repo. Make the repo the thing every agent reads before it works. Avoid treating chat as the system of record. For Windows or Linux, GitHub is the obvious replacement for Forgejo. The workflow stays the same. ### 3. Persistent Agent State Create named volumes for each agent's auth and config directory. Also define a bootstrap script that reinstalls or verifies the required CLIs after rebuilds. Without that, you get partial persistence: config survives but binaries disappear. ### 4. A Clear Mount Model Mount only what the agents need: - main repo as read-write - optional scratch/work directory as read-write - optional intake/reference directory as read-only - logs/output directory as read-write Keep secrets out of the repo. Inject them through environment variables, Docker secrets, or host-managed files outside the mounted workspace. ### 5. CI That Mirrors the Repo Workflow Use GitHub Actions (or equivalent) to automate linting, validation, and policy checks on pushes and pull requests so the repo remains trustworthy as the control plane. ## Key Design Principles - **Container bounds the workspace** — agents can't reach outside it without explicit mounts - **Git stores the memory and the rules** — not chat, not agent memory APIs - **Agents read before acting** — mandatory read order on every session start - **Persistence is designed, not assumed** — named volumes + bootstrap, not container overlay - **Risky actions stay behind explicit boundaries** — human gate for production execution - **Rules in the repo, not in the model** — documented operating contracts, enforced by hooks where possible