Hello everyone, I have been exploring the document...
# best-practices
Hello everyone, I have been exploring the documentation for prefect 2 and mostly love how intuitive the concepts and design is. There are only a few questions that come up when I think of how I would setup prefect as a scalable orchestration tool. I think I have some grasp about them but would really appreciate help clarifying them to make sure I am not going in the wrong direction. I am aiming for a setup that can deal with ever increasing numbers of flows to orchestrate and have flows with different compute needs (some heavy, some light), what would the best approach (or combination of them?): 1) should I increase the number of agents picking work from each work queue? 2) should I setup task runners to run in a pre-existing Dask/Ray cluster and increase/decrease the compute of the cluster? 3) should I set the infrastructure to run flows in ephemeral kubernetes pods and increase/decrease the k8s cluster compute according to need? On a side note: 4) If I have 2 (or more) agents monitoring the same queue, once the first agent picked a flow from the queue, no other agent will pick it, right? Thanks in advance 🙂
1. This should not improve performance in most cases. 2. That’s an option! We support adaptable Dask clusters. This is really a question of whether or not you’d prefer to maintain a separate cluster and if cluster startup times are a concern. 3. Generally, yes this is an effective way to scale. Especially if your tasks are running on the flow’s pod. 4. Yep!
2. For our use case (dev-fin-ops startup), we provision Dask clusters on Fargate per-flow to prevent scheduler memory contention from multiple flows’ tasks. I’d say it adds 4-6 minutes per flow execution. Once it’s spun up, it’s highly parallelized. Our flows are pretty beefy, so this trade-off makes sense, but it does take a while to get running. Using the external Dask cluster, we ran into scheduler memory issues and constantly had to allocate more memory to the task running the Dask scheduler.
I think if you tweak the prefetch seconds on your agent you can get your runs to start on-time
For example, if you set the prefetch to 10 minutes, your infra would spin up 10 minutes before the flow’s scheduled start time then it’d wait until the start time to actually execute
👍 1