Hi all Just browsed through the `prefect`, `dask`...
# ask-community
d
Hi all Just browsed through the
prefect
,
dask
and
ray
documentation and my head is spinning a bit... I'll be needing to run the same simulation multiple times, in parallal (embarassingly so), based on different parametrizations. I'll be doing this at our users' request, so users say go and we queue, say, 70 simulations for that user. In the meantime another user comes along, etc. The simulations will be run within a docker container, so I'm thinking to deploy to Kubernetes, with pods (and underlying nodes) scaling automatically with the number of queued jobs. So far so good. However, I have a hard time figuring out whether each single simulation should be a sub-workflow so that it can be scheduled onto pods via a Kubernetes worker pool by Prefect alone or wether a simulation should be a task, which is then scheduled by
dask
onto a distributed
dask
cluster backed by Kubernetes. At which point I'm wondering why I need prefect in the first place. So I guess my general questions is - excuse the maybe a bit confrontational wording - what benefit does prefect provide over plain
dask
, e.g.? Thanks a bunch!
n
hi @Damian Birchler - in short, dask helps you do stuff in parallel, prefect helps you do stuff robustly (retries, caching etc) on whatever infra (deployments) you want, in a way you can observe (UI) I would suggest a change to the premise of this question
what benefit does prefect provide over plain
dask
since they aren't meant to serve the same purpose. for example, we have a
prefect-dask
integration if you want to do a bunch of tasks in parallel
Copy code
#[1]
from prefect import flow, task

#[2]
from prefect_dask import DaskTaskRunner

#[3]
@task
def one_simulation(x):...

#[4]
@flow(task_runner=DaskTaskRunner())
def many_simulations(n: int):
    one_simulation.map(range(n)).result()
does that help?
d
Yes, that helps, thanks...I did check out
dask
a bit, too, and they did have a simliar (or so I thought) interface, so that didn't help clear up my confusion. But, ok, robustness seems to be a main differentiator?
by interface i meant UI
n
in my experience the dask dashboard is more like a profiler whereas the prefect UI is showing you how your work connects to other work, like maybe you grab stuff from s3 and then fan it out on dask and then reduce and send data to another container for processing. the dask dashboard just knows about your dask nodes/workers. but yeah to reiterate > But, ok, robustness seems to be a main differentiator? prefect and dask are not generally meant to serve the same purpose
d
i see...I still need to do a bit more reasearch on what the different purposes are, or rather, which server my purpose better. 🙂
n
I would say if you want to deploy a pure-python workflow in a compute provider agnostic fashion, then prefect can help! separately, if you want actual parallelism within that workflow, dask can help! reading your original message, the thing that I'd want to know is
The simulations will be run within a docker container, so I'm thinking to deploy to Kubernetes
would each simulation be in its own container? or can you run many in one container? either way this sounds like a pretty typical prefect workload (I think you're in the right place!) it just would just be a core design question with your workflow here's a couple resources in case they're helpful • getting started w prefect series • an example which sounds roughly similar to what you might end up with (define 2 deployments, one for handling a single simulation and one for batching inputs and dispatching that other deployment N times, so that each deployment's infra like k8s or docker etc can be configured independently)
d
Both would be possible. However, many simulations in one container would require a bit more bookkeeping on my part (or so I figured), so I thought I'd go with one per container/subflow(/task?) and let Prefect (/`dask`) take care of bookkeeping for me.