https://prefect.io logo
Title
t

Tim Galvin

04/20/2023, 1:52 AM
Hi all -- I have a very strange request and I am pretty sure I already know the answer, but nevertheless... I am self-hosting a Prefect server for my workflows, and I am using a
DaskTaskRunner
on a
SLURM
based cluster. When my workflow starts a dask schedular is also started, which has an address I could connect my browser to to see a pretty comprehensive overview of the real-time compute usage. In my situation with the
SLURM
cluster there are a few hurdles to get the schedular, ultimately amounting to some port forwarded ssh connections to get to the listening port of the compute node that is hosting the dashboard. So, the 'pie in the sky' question - is it feasible to integrate into the prefect UI a panel to present the dask schedular that is servicing each flow as they are running? Essentially, adding to the
DaskTaskRunner
to forward the schedular dashboard to a designated resource on the prefect server that can then be matched to the flow run and presented?
j

Jonathan Langlois

04/20/2023, 1:54 AM
This would also be helpful in a K8S cluster! 👀
👀 1
r

Ryan Peden

04/20/2023, 5:53 AM
An idea that won't quite forward the dashboard, but could potentially provide the same info: write a little function that you can call from your flow that creates and then periodically updates an Artifact. You could probably read the same stats the Scheduler dashboard reads and then save them as a nice Markdown or table artifact. And if you wanted to visualize the of compute usage throughout your flow run's lifetime, you could create a series of artifacts instead of just creating and updating a single artifact. If I were implementing this, I might spin up a subprocess when my flow starts and let it handle artifact creation via the Artifacts REST API. That way, artifact creation/updating shouldn't interfere with your flow at all, and you could just terminate the subprocess before exiting your flow.
t

Tim Galvin

04/20/2023, 6:26 AM
Thanks for that idea @Ryan Peden - I can certainly try to play around with that. If I were to do that I think the subprocess 'watcher' task is the thing to do, and it could work nicely for simple flows. In saying that though, I think it would be difficult to capture the same type of usability with this approach that the dask dashboard provides. The dask dashboard Is aggregating information from all the dask nanny (or maybe the dask worker...) processes, which is incredibly rich in multi-core and multi-node situations. Getting a comparable level of information at a task level across a slurm managed cluster might provide some difficulties.