Measure reliability with Service Level Agreements
Service Level Agreements (SLAs) are a way to define the performance expectations for your flows and to enable automated alerts when those expectations are not met.
Prerequisites
- Prefect Client Version 3.1.12 or later
- Prefect Cloud account (SLAs are only available in Prefect Cloud)
Service Level Agreements
Service Level Agreements (SLAs) help you set and monitor performance standards for your data stack. By establishing specific thresholds for flow runs on your Deployments, you can automatically detect when your system isn’t meeting expectations. When you set up an SLA, you define specific performance criteria - such as a maximum runtime of 10 minutes for a flow. If a flow run exceeds this threshold, the system generates an alert event. You can then use these events to trigger automated responses, whether that’s sending notifications to your team or initiating other corrective actions through automations.
Defining SLAs
To define an SLA you can add them to the Deployment either through the prefect.yaml
file, through using a .deploy
method, or the CLI:
Monitoring SLAs
You can monitor SLAs in the Prefect Cloud UI. On the runs page you can see the SLA status in the top level metrics:
Setting up an automation
To set up an automation to notify a team or to take other actions when an SLA is triggered, you can use the automations feature. To create the automation first you’ll need to create a trigger.
- Choose trigger type ‘Custom’.
- Choose any event matching:
prefect.sla.sla-violation
- For “From the Following Resources” choose:
prefect.flow-run.*
After creating the trigger, you can create an automation to notify a team or to take other actions when an SLA is triggered using the automations feature.