Hi there - I have a flow with a few root nodes (nodes with no dependencies) and noticed that they didn’t all kick off when the flow started - a few of them waited several hours to start. It looks suspiciously like they waited for other root nodes to complete even though they don’t depend on them. I’m using the
LocalDaskExecutor
with
threads
. Is there a max parallelization parameter of some kind that I’m unaware of?
k
Kevin Kho
09/28/2021, 1:55 PM
Hey @Kevin Weiler, the answer is kind of (there will be different defaults). If you test it on your machine, I think the default is 4 threads, but if you run it on ECS, the default will be 2 threads. You can bump it up though explicitly with
LocalDaskExecutor(num_workers=4)
if you know you have more hardware.
🙏 1
k
Kevin Weiler
09/28/2021, 1:57 PM
Ah nice - yep 4 workers track. Will this default also apply to processes?
k
Kevin Kho
09/28/2021, 1:58 PM
Yes you can use the
num_workers
with processes.
k
Kevin Weiler
09/28/2021, 9:25 PM
sorry @Kevin Kho - forgot to thank you again for this - it worked. Cheers!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.