<@ULVA73B9P> How many tasks can run in one flow? h...
# ask-marvin
y
@Marvin How many tasks can run in one flow? hundreds? thousands? more?
m
Wouldn't it be lovely if everything had a clear, simple answer? Instead, it seems I have to do a bit of digging for you. I'll check the documentation and relevant GitHub issues to find the limits on the number of tasks that can be run in one flow. Hang tight. Let's proceed step-by-step: 1. I'll search the Prefect documentation to see if there's any mention of limits on the number of tasks in a flow. 2. I'll look at GitHub issues to find any discussions or experiences related to the limitations of task numbers in a flow. Here we go. Ah, of course, nothing ever works perfectly the first time. Let's try this again. I'll focus on searching the documentation, as the GitHub issues didn't yield any relevant results. Here we go again. Well, isn't this just typical? It seems I'm encountering unexpected errors with the documentation search. Allow me a moment while I seek an alternative way to get this information. Let's try accessing a Prefect code example related to flow configurations and task limits. Surely, this should yield something useful. It appears we've retrieved a basic example of a Prefect flow, but it doesn't directly answer our question about the number of tasks that can run in a flow. Since we're having such wonderful luck with automated searches, I'll summarize from general knowledge and give you practical guidance on this. In practice, the number of tasks that can run in a single Prefect flow depends on various factors such as: 1. System Resources: CPU, memory, and network capacity. 2. Concurrency Model: Are you using Prefect's DaskExecutor or a local executor? 3. Task Complexity: Lightweight tasks can be more numerous compared to resource-intensive ones. Prefect itself is designed to handle thousands of tasks per flow, especially when configured with a scalable execution backend like Dask. However, limits aren't strictly defined and depend more on practical resource and infrastructure constraints. For more concrete numbers, you might consider running performance tests tailored to your specific environment. For detailed technical guidelines, I'd recommend consulting the official documentation or reaching out on the Prefect Discourse forum. Hopefully, this was somewhat helpful, even if everything around here isn't.
w
Perhaps this two threads can have more information for you: https://prefect-community.slack.com/archives/C048ZHT5U3U/p1718184354820239 https://prefect-community.slack.com/archives/CL09KU1K7/p1717155276268269 But a common response that you will see is "Prefect cloud supports 2000 call per minutes", other than that for prefect OSS no one talks about it. I have some flows that run thousands of tasks (IO based), and it runs fine, but I do see a HUGE degradation in perfomance when comparing against running the same solution with a thread pool for example.
y
@William Jamir Thank you for this! I’m really trying to understand the way of doing things here, regarding the number of Tasks you should have in a flow. So for context: Your mission is to write a job that downloads about 300,000 files from S3 and do some processing on each file. This should be parallelized to some degree to make it finish in reasonable time. So reading about Airflow and accumulated best practices over the web, Airflow should not do the work itself. Instead tell others what to do, and monitor. So you would have an Airflow job with a single Task, that run a single python script on some remote compute (let’s say ECS). The python script will use techniques like asynio.gather() to handle parallelism and process the files quickly. would you say the same philosophy applies to Prefect? Or, since Prefect is more “advanced” and “capable”, I could actually have thousands of Tasks running concurrently to achieve the parallelism, instead of “hiding” it in a single script using something like asyncio.gather()?