Is there a way to set the max workers and max deployed flows Prefect Community #ask-community

Is there a way to set the max_workers and max_depl...

Jack Sundberg

03/03/2021, 7:49 PM

Is there a way to set the max_workers and max_deployed_flows for a single Agent? I can see the max_workers is hardcoded here. Is it possible to add a max_workers argument to the base Agent class? I believe we would only need to change this one line. In a number of cases, I would like to limit the number of threads (workers) the Agent is using. And for max_deployed_flows to an Agent, it looks like Agents grab all flows that are ready by default, shown here. It would be useful to add a graphql "limit" tag to this mutation to ensure one Agent isn't "hogging" flow runs when another (with equivalent Labels) is open and ready for them. I'm not sure if this is possible in a mutation though and may be trickier to change.

Jim Crist-Harif

03/03/2021, 7:51 PM

Can you expand on why you want to limit the number of threads on the agent? This is only used for starting flow runs/network requests, it won't limit the number of active flow runs started by a single agent.

Jim Crist-Harif

03/03/2021, 7:55 PM

For the 2nd issue, the limiting is currently handled server side (I believe the limit is 25 currently). Generally I recommend users not to rely on prefect to distribute work evenly among agents (we're not a resource manager), and instead deploy on a backend that handles that for you.

Jack Sundberg

03/03/2021, 7:56 PM

My agent architecture is pretty atypical because I have agents spread-out over numerous desktops as well as supercomputer nodes (so submitting Agents via Slurm). In the Slurm cases, I'd like to have many agents (up to 40 at the moment) that each only run one flow at a time. Ideally, I have Prefect running on one thread and then all other threads of the slurm job are used by tasks (I submit shell commands that use mpirun).

Jack Sundberg

03/03/2021, 7:56 PM

From what you're saying, I should write a custom executor rather than have these special Agents. Is that right?

Jim Crist-Harif

03/03/2021, 7:59 PM

I wouldn't run a prefect agent in a slurm job. Rather, I'd have the prefect agent deploy flow runs as slurm jobs. So you have a single agent running on an edge node in your cluster that kicks off slurm jobs for each flow run. Rely on slurm to handle the job queuing and resource management.

Jim Crist-Harif

03/03/2021, 7:59 PM

Or batch your tasks into larger flows and run using a local agent (on an edge node) and a

DaskExecutor

using

dask-jobqueue

to distribute the tasks throughut the cluster.

Jim Crist-Harif

03/03/2021, 8:00 PM

We currently don't have a HPC jobqueue agent, but that's not out of scope.

Jack Sundberg

03/03/2021, 8:03 PM

That setup is actually what I'm trying to avoid haha. If I have two HPC clusters and a bunch of desktops running flows, one HPC cluster may be backed-up and the slurm jobs sit, while the other HPC cluster is open and getting through jobs quickly --- so my slurm jobs should be flow-run agnostic. These instead should grab the next flow-run available as soon as the slurm job starts.

Jim Crist-Harif

03/03/2021, 8:04 PM

Hmmm, I don't really have a good response for that right now. Prefect currently isn't designed to be a resource manager, so dispersal of flows across equivalent agents isn't guaranteed to be fair.

Jack Sundberg

03/03/2021, 8:05 PM

I think you're spot on with using a Dask cluster though -- the issue I have with dask-jobqueue is firewalls. On some university clusters, it's a hassle to get permission to open some of these up. If only their dask-workers followed Prefect's hybrid approach 😂

Jim Crist-Harif

03/03/2021, 8:06 PM

If you're running the client on an edge node, you shouldn't have a firewall issue in my experience. Could always go the ssh tunnel route, but admins don't always like that either :/

Jim Crist-Harif

03/03/2021, 8:07 PM

If you're trying to distribute jobs across a large set of varied machiens, you might rely on dask-ssh and dask to do the resource management 🙂 https://docs.dask.org/en/latest/setup/ssh.html

Jack Sundberg

03/03/2021, 8:09 PM

Yeah, admins for each cluster follow different rules. Prefect's approach just bypasses the issue by having Agent signals be one-directional.

Jack Sundberg

03/03/2021, 8:10 PM

Thanks though! I've been trying to avoid Dask clusters but I'll take another stab at it.

17 Views

Open in Slack

Previous Next