Case study
Self-Hosted AI & Homelab Platform
A production-grade homelab — GitOps from bare metal to local AI, and the platform that serves this very site.
- Talos
- OpenShift
- ArgoCD
- Proxmox
- Local LLM
- Cloudflare Tunnel
Problem
The best way to stay sharp on platform engineering is to run a real platform — one with the same rigour as production, where the only person on call is you. The goal: a homelab that’s a genuine proving ground for Kubernetes, GPUs, AI and security, not a pile of containers.
Constraints
- Run it like production — GitOps, backups, observability, no snowflake config.
- Secure by default — nothing exposed that doesn’t need to be.
- Reproducible — rebuild a node from code, not from memory.
Design
Proxmox provides the hypervisor layer with PCIe passthrough (GPU and storage) into single-node Talos and OpenShift clusters. Everything is ArgoCD GitOps — the cluster state lives in git and reconciles itself. On top: local LLM inference on a Blackwell-class GPU, split-horizon DNS via Pi-hole, a VPN with 2FA/SSO, and a Prometheus / Grafana observability stack. ZFS handles storage tiering; restic ships NAS-backed backups. Public services reach the internet through a Cloudflare Tunnel — which is exactly how this site is served.
Security & reliability decisions
- GitOps as the source of truth — drift is reconciled, not chased.
- 2FA / SSO and segmented access — least privilege across the lab.
- Back up state, not just volumes — restores are drilled, not hoped for.
- Outbound-only public exposure — a tunnel, not an open port.
Outcome
A homelab that behaves like a platform: rebuildable from code, observable, backed up, and secure enough to host a public site on. It’s where new patterns get proven before they go anywhere near real infrastructure — and it’s running right now, under this page.
Future improvements
Continue migrating workloads to a unified GitOps story across clusters, and harden the edge-to-cluster path as more public services come online.