https://prefect.io logo
Title
j

Justin Trautmann

04/21/2023, 9:42 AM
hello community, hello prefect team, we were recently developing a new flow with very granular tasks which eventually resulted in a flow with 50k+ task runs. since we have no blocking statements like wait or result in our flow, this causes the flow to submit all these tasks at the very beginning. Since the PrefectClient has a connection pool of only 16 connections and a pool timeout of 30s, this led to a large number of pool timeout errors. increasing the timeout could prevent the timeout errors but submission of tasks still took very long and it seemed like the agent was only occupied submitting tasks and started a way lower number of tasks at a time than expected. We eventually refactored our flow to use less granular tasks and parallelize over multiple flow runs using a flow of deployments in order to decrease the number of tasks per flow. So i was wondering if you have any recommendations or experience with large number of tasks. how much can we expect prefect to handle? what's a sustainable task submission rate that wouldn't overwhelm the prefect client and are there any special best practices for massive flows? Thanks a lot.
1
d

Deceivious

04/21/2023, 9:50 AM
following
z

Zanie

04/21/2023, 6:24 PM
We’re definitely thinking about this but I’m not sure I have concrete recommendations yet.
You may want to try using a Dask task runner for that kind of scaled concurrency — that way you’ll have multiprocessing workers.
I’m looking at a solution to the “submitting” problem in https://github.com/PrefectHQ/prefect/pull/8914
What version are you running?
Also note you can bump the default timeout with
PREFECT_API_REQUEST_TIMEOUT
— although I agree that is not the ideal solution here.
j

Justin Trautmann

04/21/2023, 6:54 PM
hey Zanie, we are already using the ray task runner and made use of PREFECT_API_REQUEST_TIMEOUT but saw that things would get stuck at some point which was when we decided to refactor for less tasks. running on prefect 2.9.0 and the prefect cloud. Cloud you please elaborate how jitter would solve the submission problem?
z

Zanie

04/21/2023, 7:01 PM
The problem (I’m looking into solving with jitter) is all of the task state transitions from PENDING -> RUNNING get queued after all of the task run creation requests which means all of the task runs must be created before any of them start running. Adding jitter means that some of the state transitions get proposed and the tasks can start running while creation continues in the background.
I’m surprised you’re seeing this problem with Ray since • Submission happens in the flow run process but once it’s submitted state transitions happen in a Ray worker process • The Ray worker process should have a unique client and connection pool
r

Rachelle Greenfield

04/27/2023, 9:01 PM
following as well. I was looking for what I was doing wrong with spawning a lot of granular tasks. I had kind of assumed they would go into a queue and get picked off by workers, but I think my mental model was incorrect lol
I’ll take a look at the ray task