Hi Team, I want to onboard my prefect project (cu...
# ask-community
a
Hi Team, I want to onboard my prefect project (currently using local agent + dask executor) to run on a K8s cluster, and i have the following queries : (I found this tutorial while looking through the old threads on GKE + dask in this slack channel) 1. Storage : In order to be able to access my flows from the K8s cluster - is it recommended to use a remote storage? a. which storage might be recommended - Docker or Bitbucket? b. In order to use the Docker storage - do I need to install the all the requirements of the flow in the "extra_dockerfile_commands" block ? 2. Dask Cluster : Is using an ephemeral Dask executor recommended ? while using a Dask cluster 3. K8s in cloud : The Prefect Docs on the prefect page mention about using K8s run config , but I want to know how to get it working on a GKE cluster ?
k
Hey @Abhas P 1. Yes. Anywhere accessible to K8s. It can be Docker, Github, S3, Bitbucket, etc. 2. Yes all the requirements would need to be in the container. You can install stuff like this if they are available in PyPI 3. This is opinionated but I think Prefect performs best with Ephemeral, but Dask itself prefers long running clusters. Either way should be fine though. 4. You agent would be configured use GKE. Check the Authentication part here: https://docs.prefect.io/orchestration/agents/kubernetes.html#agent-configuration
a
Thanks for the quick and brief response Kevin. Much appreciated 🙂 I have a follow up question : in the KubernetesRun run_config block the image that we provide should also contain all the flow dependencies installed right ?
run_config=KubernetesRun(
labels=["dask"], image="<image-for-docker-pods>", ),
k
Yes, especially if you need to use Dask because all of the workers in the cluster need to be consistent with their packages and versions
a
So logically both the images(in the Dask cluster config and Kubernetes run_config) be the same base image containing all flow dependencies?
Copy code
executor=DaskExecutor(
        cluster_class=lambda: KubeCluster(make_pod_spec(image=prefect.context.image)),
        adapt_kwargs={"minimum": 2, "maximum": 3},
    ),
    run_config=KubernetesRun(
        labels=["dask"],
        image="<image-for-docker-pods>",
    )
Also I understand how my local agent will be able to look for a local K8s cluster - but how does it look for a specific GKE cluster that's already running ?
k
Yes that is right. You kubernetes agent will look for the configured cluster
a
Can you help me understand how it looks for a remote cluster in GKE? In layman terms - how does prefect know about the existence of the specific GKE cluster?
k
When you do
kubectl get pods
in your command line, you are already connected to a cluster. So Prefect uses whatever you are already connected to.
a
@Abhas P in general the easiest way to understand it is that Kubernetes has an API. And kubectl is a command line tool that simplifies how you can talk to this API. In order to connect to a remote cluster on GKE and interact with it through your terminal session, normally when you create a GKE cluster via gcloud (GCP specific CLI tool), it automatically switches a context in your kubectl config to point to that remote cluster. In order to check to which cluster your kubectl is currently pointing to (e.g. whether you are currently talking to a local minikube cluster or a remote GKE cluster), you can run the command:
Copy code
kubectl config current-context
and if you want to e.g. switch context to another Kubernetes cluster, you can run:
Copy code
kubectl config use-context SOME_CONTEXT_NAME
a
Thank you Kevin and thanks Anna for helping me understand this 🙌