Single-Touch Edge AI Platform

Problem

Edge AI at retail scale lives or dies on repeatability. A computer-vision workload that runs perfectly in a lab has to come up the same way in a store with no on-site engineer, flaky connectivity, and a GPU that may not be ready the instant Kubernetes wants to schedule against it. The starting point was a high-level design and a pile of manual steps — exactly the gap between “it works” and “it ships.”

Constraints

No hands at the edge. Deployment has to be hands-off and idempotent — a single press.
GPU timing. Inference pods must never schedule before the GPU device plugin is healthy, or they crash-loop and poison the rollout.
Heterogeneous stores. Per-site variables (network, hardware, identity) without forking the platform for every location.

Design

I took the high-level designs and turned them into low-level, problem-solving deployments driven by CD pipelines. The application is packaged as containers and shipped to a store-edge Kubernetes cluster via Helm with end-state manifests. Per-store configuration is injected from a single source of truth, so one pipeline produces a correct deployment for any site.

The load-bearing piece is readiness gating: Bash/Shell probes and Kubernetes watchdogs confirm the GPU device plugin is up before inference pods are allowed to run, and pod lifecycle management keeps the workload honest from there.

Security & reliability decisions

Init-gated GPU readiness — the single biggest reliability win; no more pods racing the GPU at boot.
Single source of truth for config — drift can’t creep in store-to-store.
Spec-driven, documented-as-code — the deployment is the documentation.

Outcome

A high-level idea becomes a real, repeatable deployment on a single press. New edge sites come up consistently, GPUs come online reliably, and the manual runbook is gone — replaced by a pipeline anyone on the team can trigger.

Future improvements

Push more of the per-store delta into declarative policy, and extend the readiness model to cover the full inference dependency chain (model artifacts, egress, downstream sinks) as a single health gate.