https://prefect.io logo
m

matta

02/17/2021, 4:11 AM
Is there a way to force "depth-first execution"? Like, "only process 4 chunks at a time" or something like that? I have a Flow where I'm downloading a bunch of files, uploading them them to a Snowflake stage, and then deleting the folders. Right now it tries to download everything and I think it's blowing up the container's storage. Is there a way to make it go "down" before going out (thus clearing disk space), or maybe just be generally smarter about disk space? Thanks!
c

Chris White

02/17/2021, 4:20 AM
Hey @matta! Unfortunately there is not a great way to achieve this at this moment; assuming you’re using a Dask Executor, that decision is made entirely by the dask scheduler which optimizes for various things including worker memory, data movement, network latency, etc. More often than not Dask will execute depth first but it can’t be guaranteed.
Maybe you could download + upload to Snowflake within a single task, so that the file can be written to some temporary disk location?
m

matta

02/17/2021, 4:25 AM
Cool, I'll do that, thanks!
update: Repartitioning right before saving to disk fixed everything, for reasons I won't pretend to understand.
🧐 1
3 Views