Jacob Tomlinson
Home Blog Talks Newsletter About

Running Jupyter in your Dask Kubernetes cluster

2 minute read #dask, #kubernetes, #jupyter

Did you know that the Dask scheduler has a --jupyter flag that will start a Jupyter server running within the Dask Dashboard?

Screenshot of the Dask dashboard showing a link to Jupyter in the top right
If you set `--jupyter` the dashboard will have a link to Jupyter in the top-right. You can also manually nagivate to http://[scheduler ip]:[dashboard port]/jupyter/lab.

Dask Kubernetes

When launching Dask clusters on Kubernetes with dask-kubernetes you can also set this flag in your cluster config to run Jupyter in your KubeCluster.

The jupyterlab package needs to be installed in your container image. If you’re using the default Dask images you can install this at runtime by setting the EXTRA_PIP_PACKAGES environment variable to jupyterlab.
from dask_kubernetes.operator import KubeCluster, make_cluster_spec

# Create a cluster spec
spec = make_cluster_spec(
    env={"EXTRA_PIP_PACKAGES": "jupyterlab"},
# Append the --jupyter flag to the scheduler command args

# Create your cluster
cluster = KubeCluster(custom_cluster_spec=spec)

Now that your cluster is up and running let’s find where our Jupyter is running.

>>> print(cluster.dashboard_link)

Ok we can see the Dask dashboard is being port forwarded to port 56952 on localhost. So we can access Jupyter at http://localhost:56952/jupyter/lab.

Screenshot of Jupyter lab running on the Dask Dashboard port

The Jupyter environment will also be pre-configured to be able to connect to the Dask cluster so in your notebooks all you need to do is create a dask.distributed.Client.

from dask.distributed import Client

client = Client()

Screenshot of connecting a Client to the cluster with no configuration

Have thoughts?

I love hearing feedback on my posts. You should head over to Twitter and let me know what you think!

Spotted a mistake? Why not suggest an edit!