Hello! I’m investigating using Prefect and was hoping for some advice from anyone with a similar use case. We need to provision and configure a set of cloud infrastructure for each customer, some steps of which depend on the previous step being completed.
We’re using a k8s cluster to run all our orchestration. Currently, we have a homegrown task-runner type solution using celery to queue a list of ordered tasks that are picked up by runners (each runner is a single pod, but can handle multiple tasks simultaneously). A runner picks up the entire list at once. (I think the prefect equivalent would be a “flow”). Our main goals for switching are:
• better support for DAGs, since not all tasks rely on others
• better asyncio support, we spend a lot of time busy-waiting for other things to complete so raw compute isn’t really important
• improved visibility into task progress, ability to retry in a more granular fashion
I was thinking about the following:
• one prefect server deployment connected to a persistent database in RDS
• one or more “process”-type worker pods to execute tasks (since tasks aren’t super expensive, they don’t need dedicated pods)
• task flows are baked into the worker pods, they don’t change often enough to justify spinning up an S3 bucket or Git solution
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.