← All projects

Case study

IaC Fleet Automation

Stood up identical edge sites from code — every store comes up the same way, every time.

Automation Engineer·2025 – Present

  • Ansible
  • AWX
  • GitOps
  • ACR / NVCR
  • Image pre-pull
  • Secrets mgmt
Git source of truthvars + codeAWX / Ansiblesingle-touchEdge fleettemplated per-storeACR / NVCR mirrorair-gapped images
One source of truth → AWX/Ansible → identical edge nodes, even air-gapped

Problem

A fleet only behaves like a fleet if every node is built the same way. Hand-configuring GPU drivers, CNI, image caches and secrets per site is slow, error-prone, and impossible to audit — and at the edge, half the sites can’t reach the internet when you need them to.

Constraints

  • Repeatability over cleverness — the same playbook must produce the same node anywhere.
  • Air-gapped reality — disconnected edge sites still have to build from local images.
  • No secrets in code — credentials delivered at deploy time, never committed.

Design

Ansible playbooks, orchestrated by AWX and wired through a single source-of-truth pipeline, own the whole node build: GPU operator install, templated network attachments, container image pre-pull, and secrets injected from a managed store. Company- and site-specific variables are layered on top of a shared base so one playbook set serves the whole fleet.

For disconnected sites, air-gapped registry workflows mirror images across Azure Container Registry and NVCR and pre-stage them locally, so a build never depends on a live internet path at the moment it matters.

Security & reliability decisions

  • Secrets management at deploy time — nothing sensitive in git.
  • Pre-staged, mirrored images — supply chain stays available and pinned, even offline.
  • AWX job-level reporting — every run is visible and auditable.

Outcome

New edge sites are provisioned from code with consistent results, manual build steps are removed wherever logic allows, and the whole fleet is reproducible — an IaC-first build instead of a runbook.

Future improvements

Tighten the loop from commit to provisioned site, and fold image-mirror freshness into the same pipeline so air-gapped caches are never silently stale.