https://prefect.io logo
a

Andrew Rosen

06/29/2023, 4:41 PM
Hi all. I am interested in using Prefect to run workflows on an HPC machine with the Slurm job scheduler. However, there isn't much documentation for how to use
prefect-dask
to achieve this. Does anyone have some minimal working examples for how to dispatch individual tasks in a flow as Slurm jobs?
d

Daniel

06/29/2023, 9:15 PM
It took me a while, but I've got it working now. If you're not using deployments you can do this:
Copy code
import asyncio

from prefect import flow, task
from prefect_dask.task_runners import DaskTaskRunner
from dask_jobqueue import SLURMCluster


cluster_kwargs = {
    "jobs": 3,
    "cores": 1,
    "memory": "1G",
    "walltime": 1,
}


async def make_cluster(jobs=1, **kwargs):
    cluster = await SLURMCluster(**kwargs)
    cluster.scale(jobs)
    return cluster

cluster = asyncio.run(make_cluster(**cluster_kwargs))


@task
def log_task(name: str):
    return name.upper()


@flow(task_runner=DaskTaskRunner(cluster.scheduler_address))
def log_flow(names: list):
    futures = []
    for name in names:
        futures.append(log_task.submit(name))
    return [f.result() for f in futures]


if __name__ == "__main__":
    print(log_flow(["a", "b"]))
It's a little trickier from a deployment, since you can't use asyncio.run to create the cluster. I worked around this by wrapping log_flow in an async flow, then deploying that:
Copy code
@flow
async def log_flow_slurm(names: list):
    cluster = await make_cluster(**cluster_kwargs)
    return log_flow.with_options(task_runner=DaskTaskRunner(cluster.scheduler_address))(names)
a

Andrew Rosen

06/29/2023, 9:16 PM
thank you so much!! I'll play around with this 🙂 I'm definitely super new to Prefect so this is a great starting point
👍 1
@Daniel: do you just run this on the login node and then it
sbatch
-es? or is this run some other way?
d

Daniel

06/29/2023, 10:31 PM
Yes the simplest way is running on the login node. Although it can also run from another job, assuming slurm is available on the workers. I sometimes run a job that calls this code
👍 1
a

Andrew Rosen

06/29/2023, 10:31 PM
got it!
@Daniel --- have you tried running Slurm jobs with an agent by chance?
d

Daniel

07/01/2023, 2:36 AM
Yes, the deployment method above works fine for me using an agent. I also tried running the agent as a slurm job, but that doesn't make the flows/tasks run within the slurm job, since they're separate from the agent.
a

Andrew Rosen

07/01/2023, 2:36 AM
that makes sense. awesome!
@Daniel: I assume that your compute nodes are able to access an outside network? I got everything working smoothly, but alas, there is no hope for the compute nodes on my HPC machine to be able to access Prefect Cloud (and I'm not sure how feasible it will be to spin up a server locally). Just wanted to make sure I wasn't missing something obvious!
d

Daniel

07/24/2023, 3:51 AM
They are, but actually I haven't tried it with Prefect Cloud anyway. I run my own server using this simple docker-compose.yml:
Copy code
services:
  prefect-server:
    image: prefecthq/prefect:2.10-python3.11
    ports:
      - 4200:4200
    volumes:
      - prefect_test:/root/.prefect
    environment:
      PREFECT_API_URL: http://$HOSTNAME:4200/api
      PREFECT_SERVER_API_HOST: 0.0.0.0
    command: prefect server start
volumes:
  prefect_test:
a

Andrew Rosen

07/24/2023, 4:59 AM
cool, thanks again 😄
👍 1