Is there a SLURM executor?
# ask-community
h
Is there a SLURM executor?
z
Hi @Hugo Slepicka, there is not I’m sorry. I’d love to hear a bit more about your use case to see if it’s an executor we’d want to add!
k
Hi @Hugo Slepicka , is the SLURM mainly for queues ? Do you use it with Dask?
h
@Zach Angell, @Kevin Kho Our use case is to use Prefect to orchestrate tasks alongside the cluster we have on campus. We could spawn a Dask scheduler and Dask clients but we don’t want to impose Dask. Example: task writes some slurm batch script, and submits to slurm via sbatch, monitors its progress, and consumes its output when finished.
k
Could you give me a small example how you would do it with just Python but not Prefect so I can get a better picture if that’s possible?
@Hugo Slepicka, I think this might be possible after looking at
dask.jobqueue.SLURMCluster
. Have you tried it? I outlined my thoughts here
a
+1 for tutorial to write something like a SLURM or LSF cluster for HPC. I'm using a
DaskExecutor
connected to a dask cluster from
dask.jobqueue.LSFCluster
(similar to
dask.jobqueue.SLURMCluster
) but am encountering alot of problems so that would help HPC users test and triage to see if issues is in the pipeline code, the HPC or the Dask Cluster
I think a
SlurmExecutor
would have to persist all data in files, then submit tasks via command line jobs (no python API similar to LSF IIRC)
k
I dont think we have anyone on the team with SLURM experience. Would you like to take a stab at adding it to the docs?
p
Hey, just pulled up this old thread, I'm having an issue with submitting jobs to a SLURMCluster, was there an example created?
Copy code
import prefect
from prefect import task, Flow
import dask
import dask_jobqueue


from dask_jobqueue import SLURMCluster
from prefect.executors import DaskExecutor



def SLURM_exec():
    cluster = SLURMCluster()
    logging = prefect.context.get("logger")
    logging.debug(f"Dask cluster started")
    logging.debug(f"see dashboard {cluster.dashboard_link}")
    return cluster

@task
def hello_task():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hello!")

with Flow("example", executor=DaskExecutor(cluster_class = SLURM_exec)) as flow:
    hello_task()
Or should I start a new thread
k
An has mentioned some stuff around this
p
Sorry, keep on having to hop onto another network. Yes, I saw those posts, thanks @Kevin Kho -
Copy code
I'm seeing this log:
20:48:37
INFO
agent
Submitted for execution: PID: 2061


20:48:39
INFO
CloudFlowRunner
Beginning Flow run for 'example'


20:48:40
INFO
DaskExecutor
Creating a new Dask cluster with `None.SLURM_exec`...


20:48:41
INFO
DaskExecutor
The Dask dashboard is available at <http://xxx:46778/status>
I'm seeing the job get started, but no workers getting started.
Ah, ok. ah, got it. I'd set the network interface in the dask config file as
ens160
and spotted this:
Copy code
ValueError: 'ens160' is not a valid network interface. Valid network interfaces are: ['lo', 'ethbond0', 'ibbond0', 'idrac', 'em1', 'em2', 'ib0', 'ib1']
I'll try to get a minimal example up for you guys.
k
Oh sorry I didn’t respond. But this definitely seems out of my wheelhouse 😅. Yes we’d appreciate material around this for sure