← All projects

Case study

Self-Hosted AI & Homelab Platform

A production-grade homelab — GitOps from bare metal to local AI, and the platform that serves this very site.

Owner / Operator·Ongoing

  • Talos
  • OpenShift
  • ArgoCD
  • Proxmox
  • Local LLM
  • Cloudflare Tunnel
deploysProxmoxPCIe passthroughTalos / OpenShiftsingle-node clustersArgoCDGitOps reconcileLocal AI · DNS · ObsLLM · Pi-hole · PrometheusCloudflare Tunneloutbound onlyInternetwww.bztmon.com
Bare metal → GitOps clusters → services, exposed outbound-only via a tunnel

Problem

The best way to stay sharp on platform engineering is to run a real platform — one with the same rigour as production, where the only person on call is you. The goal: a homelab that’s a genuine proving ground for Kubernetes, GPUs, AI and security, not a pile of containers.

Constraints

  • Run it like production — GitOps, backups, observability, no snowflake config.
  • Secure by default — nothing exposed that doesn’t need to be.
  • Reproducible — rebuild a node from code, not from memory.

Design

Proxmox provides the hypervisor layer with PCIe passthrough (GPU and storage) into single-node Talos and OpenShift clusters. Everything is ArgoCD GitOps — the cluster state lives in git and reconciles itself. On top: local LLM inference on a Blackwell-class GPU, split-horizon DNS via Pi-hole, a VPN with 2FA/SSO, and a Prometheus / Grafana observability stack. ZFS handles storage tiering; restic ships NAS-backed backups. Public services reach the internet through a Cloudflare Tunnel — which is exactly how this site is served.

Security & reliability decisions

  • GitOps as the source of truth — drift is reconciled, not chased.
  • 2FA / SSO and segmented access — least privilege across the lab.
  • Back up state, not just volumes — restores are drilled, not hoped for.
  • Outbound-only public exposure — a tunnel, not an open port.

Outcome

A homelab that behaves like a platform: rebuildable from code, observable, backed up, and secure enough to host a public site on. It’s where new patterns get proven before they go anywhere near real infrastructure — and it’s running right now, under this page.

Future improvements

Continue migrating workloads to a unified GitOps story across clusters, and harden the edge-to-cluster path as more public services come online.