https://prefect.io logo
a

Andrew Rosen

07/25/2023, 1:26 AM
I am currently using Prefect with
prefect-dask
and
dask-jobqueue
to launch Prefect flows to a job queuing system on an HPC cluster. However, the compute nodes don't have network connectivity. Is there any hope of using Prefect Cloud in this scenario? Just curious if I'm missing something obvious.
n

Nate

07/25/2023, 2:30 AM
hi @Andrew Rosen - in what capacity would you like to use Prefect Cloud here?
a

Andrew Rosen

07/25/2023, 2:33 AM
Ideally, I'd like to use Prefect Cloud primarily so I don't have to host/maintain my own Prefect server. The thought process here is that the HPC cluster has login nodes and compute nodes. The
prefect-dask
DaskTaskRunner
allows me to submit flows from the login node to the compute node where they will run. This works perfectly fine. But because the compute nodes have no network connection, it crashes at the end because it can't report back to Prefect Cloud about the result of the flow run. While in principle I could run a Prefect Server on the login nodes, the HPC staff won't approve of that because it would impact other users. So, I'm having trouble seeing if there is a way for me to use Prefect in this scenario. I was able to use Prefect on other HPC machines that do have network connectivity on the compute nodes, but that is a rarity.
n

Nate

07/25/2023, 2:38 AM
hmm, is it accurate to say that login nodes -> public subnet, compute nodes -> private subnet? at a high level, could a login node submit work to a compute node? oh yes, i read your message better now 🙂 in seems possible in principle to run a worker on the login node that would submit work to compute nodes while it communicates with prefect cloud, but I'm not familiar with that as an actual pattern
a

Andrew Rosen

07/25/2023, 2:40 AM
hmmm not sure I can follow the analogy seeing as I don't know much about subnets 😅 but the typical setup (without prefect) is that the user will SSH into the machine's login nodes, and then from the login nodes (which is a shared resource), they will submit a job to the scheduling system (which in this case, is a Dask cluster) that will run on one or more of the compute nodes.
seems possible in principle to run a worker on the login node that would submit work to compute nodes while it communicates with prefect cloud
yeah I haven't seen many examples either. I'll keep experimenting. I got close, but it requires a combination of
prefect
,
prefect-dask
, and
dask-jobqueue
so there aren't many people with knowledge about it...
n

Nate

07/25/2023, 2:43 AM
interesting - i am likely not the best equipped to handle this question, but I can ask around internally on your behalf tomorrow!
a

Andrew Rosen

07/25/2023, 2:44 AM
In case it helps clarify the situation at all, here is the function I use to make a
DaskTaskRunner
. I feed this
DaskTaskRunner
as a
task_runner
argument to the
@flow
, and if I do that from a login node, it will spin up a Dask cluster on the compute nodes and submit the flow for execution. The problem is when the results need to be reported back to Prefect Cloud. anyway, not a problem! 🙂 I know this isn't the common usage scenario. definitely let me know if you find any info!
this kind of setup is pretty standard in academic HPC environments. we haven't really gotten around to using cloud compute around here. tons of money and infrastructure is put into on-prem supercomputers
n

Nate

07/25/2023, 2:54 AM
I see - that makes a lot of sense. I am not sure whether the compute nodes would be able to report back out (that likely depends on the networking situation). I have more experience with k8s that does some of that
Service
level work for you, but I can definitely see if anyone has more context on a setup like this internally!
🙏 1
a

Andrew Rosen

07/25/2023, 2:54 AM
thanks! yup, here the compute needs can't report back out to anything but the login node 😞
👍 1