hey just wondering if there is a way to complete a complete chain of mapped tasks before moving on t...

Arran

07/30/2021, 3:07 PM

hey just wondering if there is a way to complete a complete chain of mapped tasks before moving on to the next iteration?

Copy code

files = [1, 2, 3]
    data = extract_data.map(files)
    upload = upload.map(data)

I would like to transform the above code so that file 1 will be extracted and uploaded before moving on to file 2. I know i could wrap this functionality in to one function but i would prefer to keep them separate

Kevin Kho

07/30/2021, 3:31 PM

Hey @Arran, are you using a parallel executor like the LocalDaskExecutor? With the LocalExecutor, this can’t be changed. The Dask executors do prefer depth first execution, but it’s not something that the executor always requires and it can’t be forced to always to do it.

Arran

07/30/2021, 3:43 PM

im using LocalDaskExecutor. I have just got to 14,000 items iterated over and then python crashed, which i am assuming is because of the size of my ram, and I’m hoping that completing these one at a time might free up some of that space

Arran

07/30/2021, 3:45 PM

btw @Kevin Kho i saw you on a youtube video earlier as part of dask summit. Just liek to add my thanks for a really well put together demo

Kevin Kho

07/30/2021, 3:45 PM

Ah if the data is big there, it will be held in memory. If memory is a concern, some users write the data somewhere and then return the location to that data.

upload

would then accept the location instead of the data. This way the

data

won’t be held in memory.

Kevin Kho

07/30/2021, 3:45 PM

Thank you!

Arran

07/30/2021, 3:46 PM

would the data be help in memory even if it were to carry out both tasks before moving to the next iteration?

Kevin Kho

07/30/2021, 3:47 PM

I believe so. Think of it as a variable lying around in memory.

Kevin Kho

07/30/2021, 3:48 PM

This is the pseudo code for removing it.

Copy code

@task
def abc(x):
    res = Result()
    res.write(x, location=...)
    del x
    gc.collect()
    return res.location
@task
def abc2(location):
    res = Result().read(location)

But you might not need the delete step. Some users find that passing the location instead helps enough already.

Arran

07/30/2021, 3:50 PM

Yeah thats what i thought, wasn’t sure if it would work differently in this kind of programming paradigm, im quite fresh to it as you can probably tell 😁

Arran

07/30/2021, 3:50 PM

I’ll give that a shot. Thanks for your time

Kevin Kho

07/30/2021, 3:51 PM

No worries. Here to help 🙂

3 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.