Hugo Slepicka

    Hugo Slepicka

    11 months ago
    Is there a SLURM executor?
    Zach Angell

    Zach Angell

    11 months ago
    Hi @Hugo Slepicka, there is not I’m sorry. I’d love to hear a bit more about your use case to see if it’s an executor we’d want to add!
    Kevin Kho

    Kevin Kho

    11 months ago
    Hi @Hugo Slepicka , is the SLURM mainly for queues ? Do you use it with Dask?
    Hugo Slepicka

    Hugo Slepicka

    11 months ago
    @Zach Angell, @Kevin Kho Our use case is to use Prefect to orchestrate tasks alongside the cluster we have on campus. We could spawn a Dask scheduler and Dask clients but we don’t want to impose Dask. Example: task writes some slurm batch script, and submits to slurm via sbatch, monitors its progress, and consumes its output when finished.
    Kevin Kho

    Kevin Kho

    11 months ago
    Could you give me a small example how you would do it with just Python but not Prefect so I can get a better picture if that’s possible?
    @Hugo Slepicka, I think this might be possible after looking at
    dask.jobqueue.SLURMCluster
    . Have you tried it? I outlined my thoughts here
    a

    An Hoang

    10 months ago
    +1 for tutorial to write something like a SLURM or LSF cluster for HPC. I'm using a
    DaskExecutor
    connected to a dask cluster from
    dask.jobqueue.LSFCluster
    (similar to
    dask.jobqueue.SLURMCluster
    ) but am encountering alot of problems so that would help HPC users test and triage to see if issues is in the pipeline code, the HPC or the Dask Cluster
    I think a
    SlurmExecutor
    would have to persist all data in files, then submit tasks via command line jobs (no python API similar to LSF IIRC)
    Kevin Kho

    Kevin Kho

    10 months ago
    I dont think we have anyone on the team with SLURM experience. Would you like to take a stab at adding it to the docs?
    p

    Philip MacMenamin

    7 months ago
    Hey, just pulled up this old thread, I'm having an issue with submitting jobs to a SLURMCluster, was there an example created?
    import prefect
    from prefect import task, Flow
    import dask
    import dask_jobqueue
    
    
    from dask_jobqueue import SLURMCluster
    from prefect.executors import DaskExecutor
    
    
    
    def SLURM_exec():
        cluster = SLURMCluster()
        logging = prefect.context.get("logger")
        logging.debug(f"Dask cluster started")
        logging.debug(f"see dashboard {cluster.dashboard_link}")
        return cluster
    
    @task
    def hello_task():
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>("Hello!")
    
    with Flow("example", executor=DaskExecutor(cluster_class = SLURM_exec)) as flow:
        hello_task()
    Or should I start a new thread
    Kevin Kho

    Kevin Kho

    7 months ago
    An has mentioned some stuff around this
    p

    Philip MacMenamin

    7 months ago
    Sorry, keep on having to hop onto another network. Yes, I saw those posts, thanks @Kevin Kho -
    I'm seeing this log:
    20:48:37
    INFO
    agent
    Submitted for execution: PID: 2061
    
    
    20:48:39
    INFO
    CloudFlowRunner
    Beginning Flow run for 'example'
    
    
    20:48:40
    INFO
    DaskExecutor
    Creating a new Dask cluster with `None.SLURM_exec`...
    
    
    20:48:41
    INFO
    DaskExecutor
    The Dask dashboard is available at <http://xxx:46778/status>
    I'm seeing the job get started, but no workers getting started.
    Ah, ok. ah, got it. I'd set the network interface in the dask config file as
    ens160
    and spotted this:
    ValueError: 'ens160' is not a valid network interface. Valid network interfaces are: ['lo', 'ethbond0', 'ibbond0', 'idrac', 'em1', 'em2', 'ib0', 'ib1']
    I'll try to get a minimal example up for you guys.
    Kevin Kho

    Kevin Kho

    7 months ago
    Oh sorry I didn’t respond. But this definitely seems out of my wheelhouse 😅. Yes we’d appreciate material around this for sure