Alexandru Anghel

07/11/2022, 12:58 PM
Hello, I am trying to run flows on a Kubernetes private cluster using the Dask executor and GCS storage. I am coming from a GKE test cluster where i was able to run flows using this approach. The thing now is that the private cluster runs behind a corporate proxy. I've set up the HTTPS_PROXY env variable inside KubernetesRun job template and Prefect is able to download the flow metadata from GCS. The problem is that the same pod is creating the Dask cluster and it fails with this error:
RuntimeError(f"Cluster failed to start: {e}") from e RuntimeError: Cluster failed to start: 503, message='Service Unavailable', url=URL('<http://proxy-ip-here>:proxy-port-here')
Any ideas on how to fix this? I've tried adding a NO_PROXY env variable alongside HTTPS_PROXY but it doesn't work. I am using Prefect 1.2 Thanks!
👀 1

Kevin Kho

07/11/2022, 2:47 PM
This looks like it’s the Dask cluster unable to start right? Looks like it can’t find the Kubernetes Cluster to run against? Does the flow work without Dask?

Alexandru Anghel

07/11/2022, 5:02 PM
Yes, the Dask cluster is not starting. For sure it has to do with the HTTPS_PROXY that i had to set in order to download the Flow metadata from the GCS storage (it's not working without it). I think this blocks kubectl to create the Dask cluster. The flow is running fine on GKE, but there i don't have to set any proxy.
I am thinking of switching to local storage and send the flow metadata to a persistent volume inside the Kubernetes cluster. Would this work?