Thread
#prefect-server
    Tom Forbes

    Tom Forbes

    1 year ago
    The above is roughly pseudocode, but the main question was: how would you structure this? You can probably do most of it with
    df.apply
    directly - is that what you would do? We’d rather use Prefect for this directly if possible. But I’m not clear on how Prefect works with Dask - would a mapping task like this be the way to go? Would this scale to a high number of tasks (millions?), or would you perhaps map over the dask partitions instead?
    Kevin Kho

    Kevin Kho

    1 year ago
    Hi @Tom Forbes! I’ll find someone on the team so can better answer this.
    Tom Forbes

    Tom Forbes

    1 year ago
    Thank you! I tried to map over the partitions but it didn’t work
    Kevin Kho

    Kevin Kho

    1 year ago
    Ok so when using Prefect to orchestrate Dask, users normally create a Dask cluster and do the work there. This Prefect + Dask works best if you only use Dask for parallelization or only Prefect for parallelization. For your specific example, you should probably use
    df.apply
    . Prefect
    map
    works best for non-Dask objects like mapping over a Python list.
    Added reading of dask dataframe with Prefect
    If you need to have observability of those images individually, and really need the Prefect map, you need to structure your code without Dask and just stick to Prefect (maybe pass in the images)
    Tom Forbes

    Tom Forbes

    1 year ago
    Ok, thank you @Kevin Kho 🙏