I'm doing some concurrency testing using a simple workflow with a mapped task that starts (dummy) AWS Batch jobs.
Running locally (flow.run()) I manage to run into the AWS API limits. When I run the same flow via Prefect Cloud and an agent the bottleneck seems to be somewhere else because I don't see more than about 100 tasks running concurrently.
In both cases I use a LocalDaskExecutor with 500 threads and the EC2 instance running the flow/agent is not at all busy (only waiting for Batch jobs to finish after all).
Any ideas what could cause the slowdown using Cloud?
k
Kevin Kho
11/22/2021, 10:57 PM
This might be relevant. He said that there is a limit of 100 jobs you can wait for. Is this what you are running into?
w
Wieger Opmeer
11/22/2021, 11:06 PM
The jobs do not get started in the first place, so I don't think so. Also note that this does not occur when running locally.
k
Kevin Kho
11/22/2021, 11:07 PM
Did you specify the threads in the executor?
LocalDaskExecutor(num_workers=…)
?
w
Wieger Opmeer
11/22/2021, 11:08 PM
Jup.. num_workers=500
Wieger Opmeer
11/22/2021, 11:10 PM
I think I'll try to get a DaskExecutor with a local cluster with multiple processes and a bunch of threas per process going, to see if that makes a difference.
There could be some inter-thread locking going on
k
Kevin Kho
11/22/2021, 11:13 PM
The execution should be the same, just making sure, you don’t have any concurrency limits set right?
w
Wieger Opmeer
11/22/2021, 11:13 PM
Nope.. triple checked that by now
t
Theo Platt
12/06/2021, 7:31 PM
@Wieger Opmeer sounds like an AWS API limiter you hit when running the flow on an AWS instance as opposed to a local instance? Maybe the AWS instance just gets higher throughput than your local instance due to networking and thus hits the AWS API limit for max calls per 10 second period (or whatever they limit it to!)
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.