Thanks to the support here I've got Prefect running on GCP with
VertexRun
and
GitHub Storage
. I'm now trying to get a distributed
DaskExecutor
to run using
dask_cloudprovider.gcp.GCPCluster
. Using the same Docker image that I already had working with VertexRun, with the Dask dependencies added. I also created a Packer image based on this.
It works if I run the Flow locally (with prefect flow run …, so Vertex is bypassed), spins up a Dask cluster and completes successfully. But when I ran it from Prefect Cloud, via Vertex, it provisioned a scheduler which had some errors (failed to restart for crond, nscd, unscd) and then didn't do anything. Aside: after I cancelled the Flow, I had to manually delete this scheduler. VPC is set up to all full access within the network, so shouldn't be anything to do with that,
Any ideas? Has anyone got this working well?
k
Kevin Kho
11/12/2021, 3:20 PM
Hey @Chris Arderne, I’m not too familiar with Vertex yet, but is Flow on LocalExecutor successful for you? Or does that also fail?
Kevin Kho
11/12/2021, 3:21 PM
Could you give the traceback also?
c
Chris Arderne
11/12/2021, 3:23 PM
Yes have been using it happily with DaskExecutor() with empty params = local machine only and works fine. And as I said, also works if I run the Flow locally, which means the Dask resources are provisioned locally.
Chris Arderne
11/12/2021, 3:25 PM
There's no trace back really… some logs from the scheduler instance (the systemd stuff above) and the Vertex runner just hangs at
Creating a new Dask cluster with GCPCluster
.
k
Kevin Kho
11/12/2021, 3:27 PM
This makes me wonder if it’s a matter of providing Vertex with the permissions to spin up the cluster?
c
Chris Arderne
11/12/2021, 3:31 PM
Ok so I just spun up a long-lived Dask cluster and tried to connect to directly (i.e. specify address in DaskExecutor). Again worked on local and failed on Vertex (timed out trying to connect to tcp://…). So there must be some network thing that I've missed… Off to investigate!
Chris Arderne
11/12/2021, 3:53 PM
Confirming that it was a networking issue. Clearly I need to learn how GCP VPCs work… thanks again @Kevin Kho !
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.