#python
Tag
-
Python version epochs are broken
In PEP440 Python introduced Version Epochs as a mechanism to allow projects to change versioning scheme. Unfortunately there’s no way I could see a project actually making use of this without confusing their users.
-
A beginner's guide to managing Kubernetes resources in Python with kr8s
Managing Kubernetes resources with Python has never been easier thanks to the
kr8s
Kubernetes client for Python. -
Running Dask on Databricks
Databricks is a very popular data analytics platform used by data scientists, engineers, and businesses around the world. It was founded by the creators of Apache Spark, a powerful open-source data processing engine, and builds on top of Spark to provide a comprehensive analytics platform.
-
Running Dask workloads on multiple cluster backends with zero code changes using dask-ctl
Sometimes you want to write some code using Dask which can then be run against multiple different cluster backends. For example for local testing you might want to use
LocalCLuster
, but in production useKubeCluster
. Or perhaps you want to easily switch between an on premise HPC withSLURMRunner
or the cloud withCoiled
. -
EffVer: Version your code by the effort required to upgrade
Version numbers are hard to get right. Semantic Versioning (SemVer) communicates backward compatibility via version numbers which often lead to a false sense of security and broken promises. Calendar Versioning (CalVer) sits at the other extreme of communicating almost no useful information at all.
-
How to get typer to show help by default
I love using typer for creating CLI tools in Python. It makes creating complex trees of subcommands really straightforward.
-
Comparison of kr8s vs other Python libraries for Kubernetes
I’ve been working on
kr8s
for a while now and one of my core goals is to build a Python library for Kubernetes that is the most simple, readable and produces the most maintainable code. It should enable folks to write dumb code when working with Kubernetes. -
Livestream notes: Replacing aiohttp with httpx in kr8s
This post will be updated with notes from the livestream throughout the day.Today I will be streaming some open source code refactoring. Come and join in on Twitch!. Don’t forget to say hi in the chat 😊.
-
Introducing kr8s, a new Kubernetes client library for Python inspired by kubectl
For the last few months I’ve been tinkering with a new Kubernetes client library for Python called kr8s.
-
Debugging Data Science workflows at scale
May 12, 2023 15 minute read #python, #dask, #kubernetes, #apache-beam, #google-cloud, #google-kubernetes-engineThe more we scale up our workloads the more we run into bugs that only appear at scale. Reproducing these bugs can be expensive, time consuming and error prone. In order to report a bug on a GitHub repo you generally need to isolate the bug and come up with a minimal reproducer so that the maintainer can investigate. But what if a minimal reproducer requires hundreds of servers to isolate and replicate?
-
Sometimes I regret using CalVer
Over the last few years, many open-source Python projects that I work on have switched to CalVer. I’ve felt some pain around this, particularly in Dask and its subprojects. I want to unpack some of my thoughts and feelings around this trend.
-
Using Dask on KubeFlow with the Dask Kubernetes Operator
Kubeflow is a popular Machine Learning and MLOps platform built on Kubernetes for designing and running Machine Learning pipelines for training models and providing inference services. It has a notebook service that lets you launch interactive Jupyter servers (and more) on your Kubernetes cluster as well as a pipeline service with a DSL library written in Python for designing and building repeatable workflows. It also has tools for hyperparameter tuning and running model inference servers, everything you need to build a robust ML service.
-
How to set environment variables on your Dask workers
When working with Dask clusters you often need the remote worker environment to match you local environment. This generally means having the same packages and data available.
-
Branding your open source Python package
Having a brand can help give your open source project some legitimacy, and you don’t need to be a designer to see these benefits. However it is important to understand that you do not need to add branding to your project in order for it to be successful, and adding branding can even harm your project.
-
The evolution of a Dask Distributed user
This week was the 2021 Dask Summit and one of the workshops that we ran covered many deployment options for Dask Distributed.
-
Building a contributor community for your open source project
With our open source project published on GitHub we probably want to allow folks to contribute changes. Some users of the project may find bugs, or desire extra features and will open issues to tell you. Users who have the skills required to make that change can open a Pull Request on GitHub to propose it. As the maintainer you can then review and merge those changes.
-
Communicating with your open source community
Once your open source Python project has users and a community you will likely want to communicate with them in an official capacity. Perhaps you want to tell them about a new release, show a use case where someone is using your tool or solicit feedback on an upcoming feature.
-
Building a user community for your open source project
Now that our open source Python project exists and users can install it we will want to turn our attention to sustainability, reach and ongoing maintenance. By putting it out there and gaining users you are opening yourself up to questions, bug reports and feature requests.
-
Documenting Python projects with Sphinx and Read the Docs
In part four of this series we discussed documenting our code as we went along by adding docstrings throughout out project. In this post we will see that effort pay off by building a documentation site using Sphinx which will leverage all of our existing docstrings.
-
Automating releases of Python packages with GitHub Actions
In this post we will cover automatically packaging and releasing our project when a new git tag is pushed to GitHub.
-
Testing and Continuous Integration for Python packages with GitHub Actions
In this post we will cover automatically running our tests when we push new code to GitHub, and when contributors raise Pull Requests against our project.
-
Awaitable Objects and Async Context Managers in Python
Python objects are synchronous by default. When working with
asyncio
if we create an object the__init__
is a regular function and we cannot do any async work in here. -
Test driven development in Python
What is test driven development (TDD)?
Test driven development is a style of development where you write your tests before you write your code.
-
Testing your Python package
In this post we will cover testing our code.
Testing
There are many many great resources out there for learning about testing software. In this post I’m going to try and focus on simple examples that you can use to get started quickly. Once you have a good foundation for your tests you can then dive into mocking, replaying HTTP requests or even hypothesis testing.
-
Documenting your Python code
This post will cover documenting our code. Specifically adding documentation within the code itself.
Docstrings
Right now our code is undocumented, so if the user inspects our function they will only see the interface (the way you call it) but with no other context. We can use IPython to quickly inspect this.
-
Running Dask tutorials
Aug 21, 2020 20 minute read #python, #dask, #distributed-computing, #open-source, #community, #tutorials ArchiveOriginally published on the Dask blog on August 21st, 2020.
For the last couple of months we’ve been running community tutorials every three weeks or so. The response from the community has been great and we’ve had 50-100 people at each 90 minute session.
-
The current state of distributed Dask clusters
Originally published on the Dask blog on July 23rd, 2020.
Dask enables you to build up a graph of the computation you want to perform and then executes it in parallel for you. This is great for making best use of your computer’s hardware. It is also great when you want to expand beyond the limits of a single machine.
-
Publishing open source Python packages on GitHub, PyPI and Conda Forge
In this post we will cover making our code available to people. This is the bit where we open the source! We will push our code to a code posting platform and then package up our library and submit it to a couple of repositories to make it easy for people to install.
-
Versioning and formatting your Python code
In this post, we will cover a few project hygiene things that we may want to put into place to make our lives easier in the future.
-
Testing static sites with Lighthouse CI and GitHub Actions
Feb 13, 2020 7 minute read #python, #github, #tutorial, #github-actions, #static-sites, #lighthouse-ciWhen you build a website you want pages to load as quickly as possible for users. Google has a tool called PageSpeed Insights which you can run on your website to see various metrics about the page. I’ve used it in the past while working on my blog and other sites.
-
Creating an open source Python project from scratch
Have you had a great idea for an open-source Python library that you think people will find useful, but you don’t know where to begin in creating and publishing it?
-
Creating GitHub Actions in Python
Note: This post is also available in Go flavour.
GitHub Actions provide a way to automate your software development workflows on GitHub. This includes traditional CI/CD tasks on all three major operating systems such as running test suites, building applications and publishing packages. But it also includes automated greetings for new contributors, labelling pull requests based on the files changed, or even creating cron jobs to perform scheduled tasks.
-
Cleaning up conda environments
Often when I’m developing or debugging in Python I end up creating throw away conda environments. They will be to test some package installation or combination of packages and once I’ve finished I will probably never use them again.
-
ChatOps - Automation via chat
Originally published on the Met Office Informatics Lab blog on December 19th, 2017.
ChatOps - Automation via chat
This article is a companion to a workshop on using chat to automate ops workflows. This is a static version of a Jupyter Notebook which you can download here.
-
Getting started with VMwares ESXi/vSphere API in Python
In 2013 VMware dropped their Python library for accessing the API for ESXi/vSphere on GitHub. This is great, however it isn’t the easiest library in the world to use. This quick guide will show you how to connect to an ESXi host or vSphere cluster and get some info about a virtual machine.
-
How to easy_install and pip through a proxy
If you’re trying to install a Python package using easy_install or pip and you connect to the internet via a proxy you’ll need to make a few changes to your setup.
-
Python script: Recursively remove empty folders/directories
So as part of a script I’m writing I needed the ability to recursively remove empty folders/directories from a filesystem. After a bit of googling I found this very useful script by Eneko Alonso. However the script isn’t really in a usable state for what I want so I decided to make a few changes to it and publish it on GitHub.