The above is roughly pseudocode, but the main ques...
# prefect-server
t
The above is roughly pseudocode, but the main question was: how would you structure this? You can probably do most of it with
df.apply
directly - is that what you would do? We’d rather use Prefect for this directly if possible. But I’m not clear on how Prefect works with Dask - would a mapping task like this be the way to go? Would this scale to a high number of tasks (millions?), or would you perhaps map over the dask partitions instead?
k
Hi @Tom Forbes! I’ll find someone on the team so can better answer this.
t
Thank you! I tried to map over the partitions but it didn’t work
k
Ok so when using Prefect to orchestrate Dask, users normally create a Dask cluster and do the work there. This Prefect + Dask works best if you only use Dask for parallelization or only Prefect for parallelization. For your specific example, you should probably use
df.apply
. Prefect
map
works best for non-Dask objects like mapping over a Python list.
Added reading of dask dataframe with Prefect
If you need to have observability of those images individually, and really need the Prefect map, you need to structure your code without Dask and just stick to Prefect (maybe pass in the images)
t
Ok, thank you @Kevin Kho 🙏