Hi all Just browsed through the `prefect` `dask` and `ray` d Prefect Community #ask-community

Hi all Just browsed through the `prefect`, `dask`...

Damian Birchler

03/21/2025, 1:58 PM

Hi all Just browsed through the

prefect

dask

and

ray

documentation and my head is spinning a bit... I'll be needing to run the same simulation multiple times, in parallal (embarassingly so), based on different parametrizations. I'll be doing this at our users' request, so users say go and we queue, say, 70 simulations for that user. In the meantime another user comes along, etc. The simulations will be run within a docker container, so I'm thinking to deploy to Kubernetes, with pods (and underlying nodes) scaling automatically with the number of queued jobs. So far so good. However, I have a hard time figuring out whether each single simulation should be a sub-workflow so that it can be scheduled onto pods via a Kubernetes worker pool by Prefect alone or wether a simulation should be a task, which is then scheduled by

dask

onto a distributed

dask

cluster backed by Kubernetes. At which point I'm wondering why I need prefect in the first place. So I guess my general questions is - excuse the maybe a bit confrontational wording - what benefit does prefect provide over plain

dask

, e.g.? Thanks a bunch!

Nate

03/21/2025, 2:39 PM

hi @Damian Birchler - in short, dask helps you do stuff in parallel, prefect helps you do stuff robustly (retries, caching etc) on whatever infra (deployments) you want, in a way you can observe (UI) I would suggest a change to the premise of this question

what benefit does prefect provide over plain
dask

since they aren't meant to serve the same purpose. for example, we have a

prefect-dask

integration if you want to do a bunch of tasks in parallel

Copy code

#[1]
from prefect import flow, task

#[2]
from prefect_dask import DaskTaskRunner

#[3]
@task
def one_simulation(x):...

#[4]
@flow(task_runner=DaskTaskRunner())
def many_simulations(n: int):
    one_simulation.map(range(n)).result()

does that help?

Damian Birchler

03/21/2025, 2:41 PM

Yes, that helps, thanks...I did check out

dask

a bit, too, and they did have a simliar (or so I thought) interface, so that didn't help clear up my confusion. But, ok, robustness seems to be a main differentiator?

Damian Birchler

03/21/2025, 2:41 PM

by interface i meant UI

Nate

03/21/2025, 2:46 PM

in my experience the dask dashboard is more like a profiler whereas the prefect UI is showing you how your work connects to other work, like maybe you grab stuff from s3 and then fan it out on dask and then reduce and send data to another container for processing. the dask dashboard just knows about your dask nodes/workers. but yeah to reiterate > But, ok, robustness seems to be a main differentiator? prefect and dask are not generally meant to serve the same purpose

Damian Birchler

03/21/2025, 2:48 PM

i see...I still need to do a bit more reasearch on what the different purposes are, or rather, which server my purpose better. 🙂

Nate

03/21/2025, 2:56 PM

I would say if you want to deploy a pure-python workflow in a compute provider agnostic fashion, then prefect can help! separately, if you want actual parallelism within that workflow, dask can help! reading your original message, the thing that I'd want to know is

The simulations will be run within a docker container, so I'm thinking to deploy to Kubernetes

would each simulation be in its own container? or can you run many in one container? either way this sounds like a pretty typical prefect workload (I think you're in the right place!) it just would just be a core design question with your workflow here's a couple resources in case they're helpful • getting started w prefect series • an example which sounds roughly similar to what you might end up with (define 2 deployments, one for handling a single simulation and one for batching inputs and dispatching that other deployment N times, so that each deployment's infra like k8s or docker etc can be configured independently)

Damian Birchler

03/21/2025, 2:59 PM

Both would be possible. However, many simulations in one container would require a bit more bookkeeping on my part (or so I figured), so I thought I'd go with one per container/subflow(/task?) and let Prefect (/`dask`) take care of bookkeeping for me.

17 Views

Open in Slack

Previous Next