Run flows on Kubernetes
Learn how to run flows on Kubernetes using containers.
This guide explains how to run flows on Kubernetes. Though much of the guide is general to any Kubernetes cluster, it focuses on Amazon Elastic Kubernetes Service (EKS). Prefect is tested against Kubernetes 1.26.0 and newer minor versions.
Prerequisites
- A Prefect Cloud account
- A cloud provider (AWS, GCP, or Azure) account
- Python and Prefect installed
- Helm installed
- Kubernetes CLI (kubectl)installed
- Admin access for Prefect Cloud and your cloud provider. You can downgrade it after this setup.
Create a cluster
If you already have one, skip ahead to the next section.
One easy way to get set up with a cluster in EKS is with eksctl
.
Node pools can be backed by either EC2 instances or FARGATE.
Choose FARGATE so there’s less to manage.
The following command takes around 15 minutes and must not be interrupted:
Create a container registry
Besides a cluster, the other critical resource is a container registry. A registry is not strictly required, but in most cases you’ll want to use custom images and/or have more control over where images are stored. If you already have a registry, skip ahead to the next section.
Create a registry using the AWS CLI and authenticate the docker daemon to that registry:
Create a Kubernetes work pool
Work pools allow you to manage deployment infrastructure. This section shows you how to configure the default values for your Kubernetes base job template. These values can be overridden by individual deployments.
Switch to the Prefect Cloud UI to create a new Kubernetes work pool. (Alternatively, you could use the Prefect CLI to create a work pool.)
- Click on the Work Pools tab on the left sidebar
- Click the + button at the top of the page
- Select Kubernetes as the work pool type
- Click Next to configure the work pool settings
- Set the
namespace
field toprefect
If you set a different namespace, use your selected namespace instead of prefect
in all commands below.
You may come back to this page to configure the work pool options at any time.
Configure work pool options
Here are some popular configuration options.
Environment Variables
Add environment variables to set when starting a flow run.
If you are using a Prefect-maintained image and haven’t overwritten the image’s
entrypoint, you can specify Python packages to install at runtime with {"EXTRA_PIP_PACKAGES":"my_package"}
.
For example {"EXTRA_PIP_PACKAGES":"pandas==1.2.3"}
installs pandas version 1.2.3.
Alternatively, you can specify package installation in a custom Dockerfile, which
allows you to use image caching.
As shown below, Prefect can help create a Dockerfile with your flow code and the
packages specified in a requirements.txt
file baked in.
Namespace
Set the Kubernetes namespace to create jobs within, such as prefect
. By default, set
to default.
Image
Specify the Docker container image for created jobs.
If not set, the latest Prefect 3 image is used (for example, prefecthq/prefect:3-latest
).
You can override this on each deployment through job_variables
.
Image Pull Policy
Select from the dropdown options to specify when to pull the image.
When using the IfNotPresent
policy, make sure to use unique image tags, or
old images may get cached on your nodes.
Finished Job TTL
Number of seconds before finished jobs are automatically cleaned up by the Kubernetes controller. Set to 60 so completed flow runs are cleaned up after a minute.
Pod Watch Timeout Seconds
Number of seconds for pod creation to complete before timing out. Consider setting to 300, especially if using a serverless type node pool, as these tend to have longer startup times.
Kubernetes cluster config
Specify a KubernetesClusterConfig block to configure the Kubernetes cluster for job creation. In most cases, leave the cluster config blank since the worker should already have appropriate access and permissions. We recommend using this setting when deploying a worker to a cluster that differs from the one executing the flow runs.
Advanced Settings
Modify the default base job template to add other fields or delete existing fields.
Select the Advanced tab and edit the JSON representation of the base job template.
For example, to set a CPU request, add the following section under variables:
Next add the following to the first containers
item under job_configuration
:
Running deployments with this work pool will request the specified CPU.
After configuring the work pool settings, move to the next screen.
Give the work pool a name and save.
Your new Kubernetes work pool should appear in the list of work pools.
Create a Prefect Cloud API key
If you already have a Prefect Cloud API key, you can skip these steps.
To create a Prefect Cloud API key:
- Log in to the Prefect Cloud UI.
- Click on your profile avatar picture in the top right corner.
- Click on your name to go to your profile settings.
- In the left sidebar, click on API Keys.
- Click the + button to create a new API key.
- Securely store the API key, ideally using a password manager.
Deploy a worker using Helm
After you create a cluster and work pool, the next step is to deploy a worker. The worker sets up the necessary Kubernetes infrastructure to run your flows. The recommended method for deploying a worker is with the Prefect Helm Chart.
Add the Prefect Helm repository
Add the Prefect Helm repository to your Helm client:
Create a namespace
Create a new namespace in your Kubernetes cluster to deploy the Prefect worker:
Create a Kubernetes secret for the Prefect API key
Configure Helm chart values
Create a values.yaml
file to customize the Prefect worker configuration.
Add the following contents to the file:
These settings ensure that the worker connects to the proper account, workspace, and work pool.
View your Account ID and Workspace ID in your browser URL when logged into Prefect Cloud. For example: <https://app.prefect.cloud/account/abc-my-account-id-is-here/workspaces/123-my-workspace-id-is-here>.
Create a Helm release
Install the Prefect worker using the Helm chart with your custom values.yaml
file:
Verify deployment
Check the status of your Prefect worker deployment:
Define a flow
Start simple with a flow that just logs a message.
In a directory named flows
, create a file named hello.py
with the following contents:
Run the flow locally with python hello.py
to verify that it works.
Use the tags
context manager to tag the flow run as local
.
This step is not required, but does add some helpful metadata.
Define a Prefect deployment
Prefect has two recommended options for creating a deployment with dynamic infrastructure.
You can define a deployment in a Python script using the flow.deploy
mechanics or in a
prefect.yaml
definition file.
The prefect.yaml
file currently allows for more customization in terms of push and pull
steps.
To learn about the Python deployment creation method with flow.deploy
see
Workers.
The prefect.yaml
file is used
by the prefect deploy
command to deploy your flows.
As a part of that process it also builds and pushes your image.
Create a new file named prefect.yaml
with the following contents:
We define two deployments of the hello
flow: default
and arthur
.
By specifying dockerfile: auto
, Prefect automatically creates a dockerfile
that installs any requirements.txt
and copies over the current directory.
You can pass a custom Dockerfile instead with dockerfile: Dockerfile
or
dockerfile: path/to/Dockerfile
.
We are specifically building for the linux/amd64
platform.
This specification is often necessary when images are built on Macs with M series chips
but run on cloud provider instances.
Deployment specific build, push, and pull
You can override the build, push, and pull steps for each deployment. This allows for more custom behavior, such as specifying a different image for each deployment.
Define your requirements in a requirements.txt
file:
The directory should now look something like this:
Tag images with a Git SHA
If your code is stored in a GitHub repository, it’s good practice to tag your images
with the Git SHA of the code used to build it.
Do this in the prefect.yaml
file with a few minor modifications, since it’s not yet
an option with the Python deployment creation method.
Use the run_shell_script
command to grab the SHA and pass it to the tag
parameter of build_docker_image
:
Set the SHA as a tag for easy identification in the UI:
Authenticate to Prefect
Before deploying the flows to Prefect, you need to authenticate through the Prefect CLI.
You also need to ensure that all of your flow’s dependencies are present at deploy
time.
This example uses a virtual environment to ensure consistency across environments.
Deploy the flows
You’re ready to deploy your flows to build your images.
The image name determines its registry.
You have configured our prefect.yaml
file to get the image name from the
PREFECT_IMAGE_NAME
environment variable, so set that first:
To deploy your flows, ensure your Docker daemon is running. Deploy all the
flows with prefect deploy --all
or deploy them individually by name: prefect deploy -n hello/default
or prefect deploy -n hello/arthur
.
Run the flows
Once the deployments are successfully created, you can run them from the UI or the CLI:
You can now check the status of your two deployments in the UI.