Hello! Conceptual question: how do you create a f...
# ask-community
c
Hello! Conceptual question: how do you create a flow that starts parallel tasks from an generator? I have a large file that I want to transform in chunks because it is too big load it into memory, so I wrote an generator that yields chunks of the file. My first thought was to use the LocalDaskExecutor and try mapping the generator to my transform task, but because generators aren't subscriptable, I get
TypeError: Cannot map over unsubscriptable object of type <class 'generator'>:
Is there something I'm missing conceptually/am I framing my problem incorrectly? I can provide code samples that I've tried if that would help. Thanks!
m
I think you might be able to do this with a Dask Bag instead of a Generator? https://examples.dask.org/bag.html
z
We're considering supporting
yield
/ generators in tasks but there's not even a design document yet.
c
@matta Thanks! I looked into the bags, that would work if I had already divided the large file into smaller files, but it didn't work with a single file to start. @Zanie I'll check that out! Digging around the Dask docs some more, it seems like you'd have to use the Dask Advanced API and async to implement generators