The above is roughly pseudocode but the main question was ho Prefect Community #prefect-server

The above is roughly pseudocode, but the main ques...

Tom Forbes

05/17/2021, 8:41 PM

The above is roughly pseudocode, but the main question was: how would you structure this? You can probably do most of it with

df.apply

directly - is that what you would do? We’d rather use Prefect for this directly if possible. But I’m not clear on how Prefect works with Dask - would a mapping task like this be the way to go? Would this scale to a high number of tasks (millions?), or would you perhaps map over the dask partitions instead?

Kevin Kho

05/17/2021, 8:57 PM

Hi @Tom Forbes! I’ll find someone on the team so can better answer this.

Tom Forbes

05/17/2021, 9:06 PM

Thank you! I tried to map over the partitions but it didn’t work

Kevin Kho

05/17/2021, 9:17 PM

Ok so when using Prefect to orchestrate Dask, users normally create a Dask cluster and do the work there. This Prefect + Dask works best if you only use Dask for parallelization or only Prefect for parallelization. For your specific example, you should probably use

df.apply

. Prefect

map

works best for non-Dask objects like mapping over a Python list.

Kevin Kho

05/17/2021, 9:18 PM

Added reading of dask dataframe with Prefect

Kevin Kho

05/17/2021, 9:24 PM

If you need to have observability of those images individually, and really need the Prefect map, you need to structure your code without Dask and just stick to Prefect (maybe pass in the images)

Tom Forbes

05/18/2021, 10:59 AM

Ok, thank you @Kevin Kho 🙏

3 Views

Open in Slack

Previous Next