I needed GPU support in kind, so I added it. I’m also prone to yak shaving so it’s quick, dirty and not going upstream.
When developing tools for Kubernetes I like to use kind which runs a whole cluster inside a single Docker container. I especially like using it via pytest-kind which makes running Python unit tests against a Kubernetes cluster a breeze.
Today as of kind 0.17.0 there is no support for passing GPUs through to the Kubernetes cluster and attempts made in kubernetes-sigs/kind#1886 were rejected. It seems there is a desire to add this support to kind in the future, but disagreements on how to implement it. Sadly I don’t have time to dive into that and try and implement a robust solutions that would be accepted by the kind maintainers, so I decided to quickly hack together a version that I could use right away.
My PR adds a
gpus config option to kind nodes which passes the
--gpus=all flag to Docker. So all you need to use it is the NVIDIA drivers, Docker and the NVIDIA runtime. If you can run this quick test you’re all set.
$ docker run --rm --gpus=all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
Installing my fork
$ git clone https://github.com/jacobtomlinson/kind.git $ cd kind $ git branch gpu && git pull origin gpu $ make install $ kind version kind (@jacobtomlinson's patched GPU edition) v0.18.0-alpha.69+a32cb054c819a1 go1.19.3 linux/amd64
Creating a GPU kind cluster
Create a kind cluster template YAML and specify
gpus: true in the
control-plane node’s config.
# kind-gpu.yaml kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 name: gpu-test nodes: - role: control-plane gpus: true
Then create the cluster.
$ kind create cluster --config kind-gpu.yaml
I like to use the awesome kubectx command to switch my context over to my new kind cluster.
$ kubectx kind-gpu-test
Installing NVIDIA plugins
In order for us to be able to schedule GPUs in our cluster we need the NVIDIA plugins installed. We can do this via the NVIDIA Operator. Let’s install that with helm.
As our host machine already has NVIDIA drivers installed we need to disable the driver install step.
$ helm install --repo https://nvidia.github.io/gpu-operator nvidia/gpu-operator \ --wait --generate-name \ --create-namespace -n gpu-operator \ --set driver.enabled=false
Now that we have things set up let’s test that we can schedule a pod with a GPU, we can use the vector add example from earlier.
# gpu-pod.yaml apiVersion: v1 kind: Pod metadata: name: vectoradd spec: restartPolicy: OnFailure containers: - name: vectoradd image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 resources: limits: nvidia.com/gpu: 1
$ kubectl apply -f gpu-pod.yaml
If we run
kubectl get pods -w we should see our pod go from
Then if we check the logs we should see the same output as before.
$ kubectl logs vectoradd [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
I built a quick version of
kind with GPU support so that I can quickly test the thing I’m actually meant to be working on. Ideally I would like to help add
this feature into upstream kind for everyone to use, but I don’t have the time right now. Maybe I’ll find some time another day.
Sadly this fork of kind will slowly go out of date unless this feature is added upstream. These are the consequences of forking a project, you own the maintenance, and I do not intend on maintaining this.
You’re welcome to use my forked version, but here be dragons.