Brad
05/10/2020, 11:28 PMyield
out the data files for further mapping. Obviously I can’t do this in prefect, but I also don’t want to do an enormous reduce
because the amount of data is too large. What I currently have is just a simple map over the files, but I’m really not getting the parallelism or granularity I’d like. I’m using dask so I thought about just grabbing a worker client and doing a submit tasks from tasks
, but then I lose the benefits of having prefect tasks - is there anything anyone can suggest?emre
05/11/2020, 8:32 AMFilterTask
s to get multiple lists made of single formats. Then each one can be its own DAG branch, being mapped over its matching parser. You could also write a custom multi-way filter in order to partition all files into their own format lists in a single pass.Jeremiah
05/11/2020, 1:50 PM