I would like to transform the above code so that file 1 will be extracted and uploaded before moving on to file 2. I know i could wrap this functionality in to one function but i would prefer to keep them separate
k
Kevin Kho
07/30/2021, 3:31 PM
Hey @Arran, are you using a parallel executor like the LocalDaskExecutor? With the LocalExecutor, this can’t be changed. The Dask executors do prefer depth first execution, but it’s not something that the executor always requires and it can’t be forced to always to do it.
a
Arran
07/30/2021, 3:43 PM
im using LocalDaskExecutor. I have just got to 14,000 items iterated over and then python crashed, which i am assuming is because of the size of my ram, and I’m hoping that completing these one at a time might free up some of that space
Arran
07/30/2021, 3:45 PM
btw @Kevin Kho i saw you on a youtube video earlier as part of dask summit. Just liek to add my thanks for a really well put together demo
k
Kevin Kho
07/30/2021, 3:45 PM
Ah if the data is big there, it will be held in memory. If memory is a concern, some users write the data somewhere and then return the location to that data.
upload
would then accept the location instead of the data. This way the
data
won’t be held in memory.
Kevin Kho
07/30/2021, 3:45 PM
Thank you!
a
Arran
07/30/2021, 3:46 PM
would the data be help in memory even if it were to carry out both tasks before moving to the next iteration?
k
Kevin Kho
07/30/2021, 3:47 PM
I believe so. Think of it as a variable lying around in memory.
Kevin Kho
07/30/2021, 3:48 PM
This is the pseudo code for removing it.
Copy code
@task
def abc(x):
res = Result()
res.write(x, location=...)
del x
gc.collect()
return res.location
@task
def abc2(location):
res = Result().read(location)
But you might not need the delete step. Some users find that passing the location instead helps enough already.
a
Arran
07/30/2021, 3:50 PM
Yeah thats what i thought, wasn’t sure if it would work differently in this kind of programming paradigm, im quite fresh to it as you can probably tell 😁
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.