Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

*What’s the best practice for Data Retention Policy on Prefect deployment runs?*

Just as a reference, here is how it is implemented for Apache Airflow, as yet another garbage collector DAG:
<https://stackoverflow.com/questions/66580751/configure-logging-retention-policy-for-apache-airflow>

I’m sure that Prefect has either a built-in mechanism for that, or encourages a common idiom for rotating / archiving / deleting artifacts from old runs.

Context:
We have a persistent storage on <https://docs.prefect.io/latest/concepts/storage/#supported-storage-blocks|Azure Blob Storage> (the S3 equivalent) where we store artifacts (e.g. output files and images) from a Machine Learning (Kedro) run.
*The space can pile up pretty quickly across runs and we would run out of storage, rendering our Prefect deployments not operational.*
What kind of policies are recommended to evict data from old runs?
I don’t want to run out of space and I want the Prefect pipelines to remain operational.

I know that some of you would say: “_It depends_”, so for the sake of this example let’s imagine that I have a dedicated 256GB of storage.
Should I set a threshold (e.g. 70% of full) that will be as a trigger for evicting (removing) artifacts from old runs?
Also, when should this run? as the first (prerequisite) subflow in my bigger flow, or as yet another deployment in Prefect on a recurring schedule?

Thanks!

hi <@U03HEGMDTNW> -  what do you think about configuring <https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-policy-configure?tabs=azure-portal|lifecycle rules on your azure blob storage>?

That’s very interesting, I didn’t think about shifting the responsibility to the storage object