Hello, if anyone is using prefect on a SLURM clust...
# ask-community
s
Hello, if anyone is using prefect on a SLURM cluster, I am interested in connecting to learn about the experience.
k
Not SLURM but @An Hoang got it working on an LSFCluster as his DaskExecutor. Maybe he can chime in
s
Thanks, Kevin! Will be interesting to know more... I see a fairly straightforward attempt at a solution using
dask_jobqueue
(which supports SLURMCluster), but a bit unsure about the coordination for composite workflows (flow of flows).
a
@Sultan Orazbayev the Dask executor is responsible for submitting task runs to Dask for execution, but it doesn’t change how a flow run is executed. Therefore, flow of flows should work the same way regardless of which executor you choose.
s
Thanks, Anna! I hope that it will work out of the box, but a bit worried about handling subflows hitting job constraints (in terms of time or memory)... but it might be a non issue...
And thank you again, Anna, for the detailed answer on StackOverflow about handling dask config on failed tasks.
a
You’re very welcome. No need to worry, the flow run gets executed based on what you set in the run configuration e.g. a KubernetesRun will start a flow run as a Kubernetes job and you can control the resource requirements for the flow run using run config arguments such as cpu and memory request e.g.:
Copy code
with Flow(
        FLOW_NAME,
        storage=STORAGE,
        run_config=KubernetesRun(
            labels=["k8s"],
            cpu_request=0.5,
            memory_request="2Gi",
        ),
) as flow:
And for task run execution, you can use Dask. This would offload the heavy computation to Dask and only things like state changes would be executed on your flow run infrastructure
🙌 1
a
hey @Sultan Orazbayev my institution has LSF and transitioning into SLURM let me know if you have any questions. You can wrap the flow that uses dask cluster inside another flow that runs serially (calling
dask_flow.run()
inside the serial flow). You won't get detailed report on the dask tasks but if your flow/cluster is stable this can be an option. I usually only run the inner flow by itself only for debugging.
🙌 1
upvote 1
s
Interesting, thank you, An! One concern I have (could be a non-issue though) is how to handle failure due to exceeding the allocated time or memory limit for the job? There are [several ideas suggested here](https://stackoverflow.com/a/70332185/10693596), and the serial wrapper around a dask flow also makes sense.
118 Views