Sometimes when I give live demos with Kubernetes clusters I want to make sure that the container images I’m going to use are already pulled onto all of the nodes in my cluster. The last thing I want is for a Pod
to be created to then sit in a Pending
state while an image is pulled, especially given how large containers can be in the Data Science space.
I could run my demo through once ahead of time and clean up any resources, but if my cluster has multiple nodes I can’t guarantee the same Pods will land on the same nodes.
To avoid this I use a little DaemonSet
that will ensure all images are pulled onto every node.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: prepuller
spec:
selector:
matchLabels:
name: prepuller
template:
metadata:
labels:
name: prepuller
spec:
# Configure an init container for each image you want to pull
initContainers:
- name: prepuller-1
# Set the image you want to pull
image: ORG/IMAGE:TAG
# Use a known command that will exit successfully immediately
# Any no-op command will do but YMMV with scratch based containers
command: ["sh", "-c", "'true'"]
# - name: prepuller-2
# image: ...
# command: ["sh", "-c", "'true'"]
# etc...
# Use the pause container to ensure the Pod goes into a `Running` phase
# but doesn't take up resource on the cluster
containers:
- name: pause
image: gcr.io/google_containers/pause:3.2
resources:
limits:
cpu: 1m
memory: 8Mi
requests:
cpu: 1m
memory: 8Mi
Once you’ve applied this DaemonSet
you can watch the pre-puller Pods and once they are all in a Running
phase you know the images have been pulled.
Using a DaemonSet
is also neat because if you scale the cluster and add new nodes the images will be pulled automatically.