Jacob Tomlinson
Home Blog Talks About

High Throughput Computing with Dask: Intro Tutorial

CECAM Dask on HPC Seminar Series Online Tutorial 120 minutes Abstract Video Additional Resources

High-throughput (task-based) computing is a flexible approach to parallelization. It involves splitting a problem into loosely-coupled tasks. A scheduler then orchestrates the parallel execution of those tasks, allowing programs to adaptively scale their resource usage. Individual tasks may themselves be parallelized using MPI or OpenMP, and the high-throughput approach can therefore enable new levels of scalability.

Dask is a powerful Python tool for task-based computing. The Dask library was originally developed to provide parallel and out-of-core versions of common data analysis routines from data analysis packages such as NumPy and Pandas. However, the flexibility and usefulness of the underlying scheduler has led to extensions that enable users to write custom task-based algorithms, and to execute those algorithms on high-performance computing (HPC) resources.

This workshop will be a series of virtual seminars/tutorials on tools in the Dask HPC ecosystem. All sessions will be held online, with a live Zoom for some registered participants and a live YouTube stream for the public.