Init-gating GPU readiness on Kubernetes

The most common way a GPU workload fails at the edge isn’t the model, the driver, or the network. It’s timing. Kubernetes is eager — it will happily schedule your inference pod the moment a node is Ready, which is often before the NVIDIA device plugin has advertised nvidia.com/gpu. The pod starts, can’t see a GPU, crash-loops, and now your rollout is poisoned across the fleet.

The fix is to make readiness explicit. Don’t trust node-Ready; gate on the GPU.

Gate the schedule, not just the start

A resource request is the first line — a pod that requests a GPU won’t schedule until the plugin advertises capacity:

resources:
  limits:
    nvidia.com/gpu: 1

But on a single-GPU edge node that’s recovering from a reboot, you still want a hard check before the workload does anything expensive. An init container that blocks until the device is real keeps the main container honest:

#!/usr/bin/env bash
set -euo pipefail
# Block until the GPU is visible AND healthy, or fail loudly after a bound.
for i in $(seq 1 30); do
  if nvidia-smi -L | grep -q '^GPU 0'; then
    echo "GPU ready"; exit 0
  fi
  echo "waiting for GPU ($i/30)"; sleep 5
done
echo "GPU never became ready" >&2
exit 1

Why this is the win

Once readiness is gated, the whole class of “pod started before the GPU” failures disappears — and it disappears the same way on every node. That consistency is the real prize at the edge, where no one is standing next to the box to nurse a bad rollout.

The principle generalises: at the edge, design the dependency, don’t hope for it. The GPU is just the first dependency worth gating; egress paths and model artifacts are next.