Hello if anyone is using prefect on a SLURM cluster I am int Prefect Community #ask-community

Join Slack

Hello, if anyone is using prefect on a SLURM clust...

# ask-community

Sultan Orazbayev

01/17/2022, 12:18 AM

Hello, if anyone is using prefect on a SLURM cluster, I am interested in connecting to learn about the experience.

Kevin Kho

01/17/2022, 5:38 AM

Not SLURM but @An Hoang got it working on an LSFCluster as his DaskExecutor. Maybe he can chime in

Sultan Orazbayev

01/17/2022, 7:26 AM

Thanks, Kevin! Will be interesting to know more... I see a fairly straightforward attempt at a solution using

dask_jobqueue

(which supports SLURMCluster), but a bit unsure about the coordination for composite workflows (flow of flows).

Anna Geller

01/17/2022, 10:15 AM

@Sultan Orazbayev the Dask executor is responsible for submitting task runs to Dask for execution, but it doesn’t change how a flow run is executed. Therefore, flow of flows should work the same way regardless of which executor you choose.

Sultan Orazbayev

01/17/2022, 11:00 AM

Thanks, Anna! I hope that it will work out of the box, but a bit worried about handling subflows hitting job constraints (in terms of time or memory)... but it might be a non issue...

Sultan Orazbayev

01/17/2022, 11:03 AM

And thank you again, Anna, for the detailed answer on StackOverflow about handling dask config on failed tasks.

Anna Geller

01/17/2022, 11:14 AM

You’re very welcome. No need to worry, the flow run gets executed based on what you set in the run configuration e.g. a KubernetesRun will start a flow run as a Kubernetes job and you can control the resource requirements for the flow run using run config arguments such as cpu and memory request e.g.:

Copy code

with Flow(
        FLOW_NAME,
        storage=STORAGE,
        run_config=KubernetesRun(
            labels=["k8s"],
            cpu_request=0.5,
            memory_request="2Gi",
        ),
) as flow:

And for task run execution, you can use Dask. This would offload the heavy computation to Dask and only things like state changes would be executed on your flow run infrastructure

🙌 1

An Hoang

01/17/2022, 8:42 PM

hey @Sultan Orazbayev my institution has LSF and transitioning into SLURM let me know if you have any questions. You can wrap the flow that uses dask cluster inside another flow that runs serially (calling

dask_flow.run()

inside the serial flow). You won't get detailed report on the dask tasks but if your flow/cluster is stable this can be an option. I usually only run the inner flow by itself only for debugging.

🙌 1

upvote 1

Sultan Orazbayev

01/18/2022, 1:10 AM

Interesting, thank you, An! One concern I have (could be a non-issue though) is how to handle failure due to exceeding the allocated time or memory limit for the job? There are [several ideas suggested here](https://stackoverflow.com/a/70332185/10693596), and the serial wrapper around a dask flow also makes sense.

140 Views

Open in Slack

Previous Next