Sometimes I want to oversubscribe the GPUs in my Kubernetes cluster. This is especially useful when I’m developing but could also be useful in light workloads where you have ample GPU memory and don’t mind the occasional failure.
Newer NVIDIA Data Center GPUs support MIG which allows partitioning a GPU at the hardware level.
You might also find MPS interesting which allows multiple processes to share the GPU at a CUDA level. However it seems Kubernetes support for this today is still a work in progress.
In this post we are going to use time slicing to share our GPUs between Pods. This works by running many CUDA processes on the same GPU and giving them equal time slices of computation.
Ok enough warning you that this might not be a good idea. Let’s do it!
Prep
First we need a working Kubernetes cluster with some GPUs.
Kubernetes Cluster
I’m going to run one on my workstation with my patched version of kind, but you get yours however you prefer.
$ cat << EOF > kind-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: gpu-cluster
nodes:
- role: control-plane
gpus: true
EOF
$ kind create cluster --config kind-gpu.yaml
Creating cluster "gpu-cluster" ...
✓ Ensuring node image (kindest/node:v1.23.1) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-gpu-cluster"
You can now use your cluster with:
kubectl cluster-info --context kind-gpu-cluster
Thanks for using kind! 😊
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gpu-cluster-control-plane Ready control-plane,master 74s v1.23.1
NVIDIA Operator
Next we need to install the NVIDIA Operator which can install NVIDIA drivers and the Kubernetes GPU device plugin.
kind
on my workstation which already has NVIDIA drivers installed I’ll set a flag to skip that.
$ helm install --repo https://helm.ngc.nvidia.com/nvidia gpu-operator \
--wait --generate-name \
--create-namespace -n gpu-operator \
--set driver.enabled=false
NAME: gpu-operator-1675786339
LAST DEPLOYED: Tue Feb 7 16:12:27 2023
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
Once the operator has successfully installed you should see all of the Pods it created in a Running
phase except for the two validator Pods which should be Completed
.
$ kubectl get pods --namespace gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-t6pwg 1/1 Running 0 2m1s
gpu-operator-1675786339-node-feature-discovery-master-bc94sqsn7 1/1 Running 0 2m33s
gpu-operator-1675786339-node-feature-discovery-worker-l9rc5 1/1 Running 0 2m33s
gpu-operator-5cf698664-kswsm 1/1 Running 0 2m33s
nvidia-container-toolkit-daemonset-z2nst 1/1 Running 0 2m1s
nvidia-cuda-validator-4cgbx 0/1 Completed 0 80s
nvidia-dcgm-exporter-9jkdt 1/1 Running 0 2m1s
nvidia-device-plugin-daemonset-lgd7l 1/1 Running 0 2m1s
nvidia-device-plugin-validator-lz9qm 0/1 Completed 0 59s
nvidia-operator-validator-vn9w4 1/1 Running 0 2m1s
Ok we should be good to run some GPU workloads.
Some workload
Now that we have a GPU capable Kubernetes cluster let’s throw too much work at it. My workstation has two GPUs in it, so let’s create a deployment that needs four GPUs.
$ cat << EOF | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-workload
spec:
replicas: 4
selector:
matchLabels:
app: gpu-workload
template:
metadata:
labels:
app: gpu-workload
spec:
containers:
- name: pause
image: gcr.io/google_containers/pause
resources:
limits:
nvidia.com/gpu: 1
EOF
deployment.apps/gpu-workload created
Now if we have a look at our Pods we should see two Running
and two Pending
because we hit our GPU capacity.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
gpu-workload-5cf65846c5-72vz2 1/1 Running 0 13s
gpu-workload-5cf65846c5-ngtzf 1/1 Running 0 13s
gpu-workload-5cf65846c5-g6brs 0/1 Pending 0 13s
gpu-workload-5cf65846c5-qjhtc 0/1 Pending 0 13s
Enabling time-slicing
To enable time-slicing we are going to follow the guide in the official docs.
First we need to create a configuration for the NVIDIA Device Plugin to use. Here we create a new config profile called default
which allows one GPU to be sliced into four.
$ cat << EOF | kubectl create -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config
namespace: gpu-operator
data:
default: |-
version: v1
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 4
EOF
configmap/time-slicing-config created
Then we need to patch the NVIDIA Operator to tell it to use this config.
$ kubectl patch clusterpolicy/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config"}}}}'
Then we need to tell all of our nodes to use the default profile (although you can configure this on a per-node basis too).
$ kubectl patch clusterpolicy/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "default"}}}}'
Now if we give the controller a minute to restart and check our Pods again we should see everything in a Running
phase.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
gpu-workload-5cf65846c5-72vz2 1/1 Running 0 10m
gpu-workload-5cf65846c5-g6brs 1/1 Running 0 10m
gpu-workload-5cf65846c5-ngtzf 1/1 Running 0 10m
gpu-workload-5cf65846c5-qjhtc 1/1 Running 0 10m
We can also check the node and see that it now appears that we have eight GPUs instead of the actual two because they are being time sliced into four pieces.
$ kubectl describe node gpu-cluster-control-plane| grep -E "(nvidia.com/gpu:|Capacity:|Allocatable:)"
Capacity:
nvidia.com/gpu: 8
Allocatable:
nvidia.com/gpu: 8
Closing
I work a bunch on various Kubernetes deployment tools, so I often want to have many Pods running in my cluster that all use GPUs. Being able to oversubscribe the GPUs in my workstation means I can try more stuff out locally. I’m not actually going to run any large workloads on this cluster, if I did I may run into memory errors and other problems from GPU sharing.
But this solves my problem!