Damian Birchler
03/21/2025, 1:58 PMprefect
, dask
and ray
documentation and my head is spinning a bit... I'll be needing to run the same simulation multiple times, in parallal (embarassingly so), based on different parametrizations. I'll be doing this at our users' request, so users say go and we queue, say, 70 simulations for that user. In the meantime another user comes along, etc. The simulations will be run within a docker container, so I'm thinking to deploy to Kubernetes, with pods (and underlying nodes) scaling automatically with the number of queued jobs. So far so good.
However, I have a hard time figuring out whether each single simulation should be a sub-workflow so that it can be scheduled onto pods via a Kubernetes worker pool by Prefect alone or wether a simulation should be a task, which is then scheduled by dask
onto a distributed dask
cluster backed by Kubernetes. At which point I'm wondering why I need prefect in the first place.
So I guess my general questions is - excuse the maybe a bit confrontational wording - what benefit does prefect provide over plain dask
, e.g.?
Thanks a bunch!Nate
03/21/2025, 2:39 PMwhat benefit does prefect provide over plainsince they aren't meant to serve the same purpose. for example, we have adask
prefect-dask
integration if you want to do a bunch of tasks in parallel
#[1]
from prefect import flow, task
#[2]
from prefect_dask import DaskTaskRunner
#[3]
@task
def one_simulation(x):...
#[4]
@flow(task_runner=DaskTaskRunner())
def many_simulations(n: int):
one_simulation.map(range(n)).result()
does that help?Damian Birchler
03/21/2025, 2:41 PMdask
a bit, too, and they did have a simliar (or so I thought) interface, so that didn't help clear up my confusion. But, ok, robustness seems to be a main differentiator?Damian Birchler
03/21/2025, 2:41 PMNate
03/21/2025, 2:46 PMDamian Birchler
03/21/2025, 2:48 PMNate
03/21/2025, 2:56 PMThe simulations will be run within a docker container, so I'm thinking to deploy to Kuberneteswould each simulation be in its own container? or can you run many in one container? either way this sounds like a pretty typical prefect workload (I think you're in the right place!) it just would just be a core design question with your workflow here's a couple resources in case they're helpful • getting started w prefect series • an example which sounds roughly similar to what you might end up with (define 2 deployments, one for handling a single simulation and one for batching inputs and dispatching that other deployment N times, so that each deployment's infra like k8s or docker etc can be configured independently)
Damian Birchler
03/21/2025, 2:59 PM