Hi all! My team is considering a switch to Google ...
# prefect-community
t
Hi all! My team is considering a switch to Google Kubernetes Engine for our Prefect infrastructure. Does anyone have recipes or examples of deploying flows with GKE clusters in Prefect 2.x? Any help would be greatly appreciated.
c
Using a pipeline or ..? The process is identical across any cloud since you are just building a deployment and applying it. The only change will be to your storage and credentials more than likely. If you use a kubernetes-job for your infrastructure block, and the agent runs in GKE then it will pull down the flow runs appropriately
t
I have an agent running in GKE and picking up flow runs when using the base Prefect image, but can't seem to get it to work with my custom image hosted in GCP. Was thinking it'd be helpful to see some examples end to end in case I did something wrong in the process
Btw, thanks for your post in Discourse! Super helpful for getting even this far 🙂
c
I have a working pipeline that looks like this:
Copy code
image = "<http://gcr.io/|gcr.io/>" + GCP_PROJECT_ID + "/" + environ['PROJECT_NAME']

k8s_job = KubernetesJob(
    image=image,
    namespace="prefect2",
    # name="healthcheck",
    name=environ['PROJECT_NAME'],
    customizations=customizations,
    env=dict(
        GCP_PROJECT_ID=GCP_PROJECT_ID,
        GCP_RESULTS_BUCKET=GCP_RESULTS_BUCKET,
        PREFECT_VERSION=PREFECT_VERSION,
        PYTHON_VERSION=PYTHON_VERSION
    ),
    labels={"environment": f'{APP_ENVIRONMENT}'.lower()},
    finished_job_ttl=600,
    job_watch_timeout_seconds=600,
    pod_watch_timeout_seconds=600
)

deployment = Deployment(
    name=f"flow-{APP_ENVIRONMENT}",
    flow_name="flow-{APP_ENVIRONMENT}",
    version=1,
    work_queue_name="dev",
    infrastructure=k8s_job,
    path="/opt/prefect/flows",
    parameters=params,
    entrypoint="flow.py:main"
)
the ability to pull from your container registry would be an iam policy permission on the GKE cluster itself though
but as long as you specify the right image, and have the permissions. it’s fairly straightforward (that is, not cloud specific)
❤️ 1
t
I previously used my docker image for deploying an agent to a GCP Compute Engine VM, so I have an entrypoint at the bottom of the Dockerfile. Do you think that could be causing issues?
c
the flow will override your docker entrypoint
t
I see
The thing that's tripping me up is: • If I don't specify the image, I can run flows with standard dependencies that are included in the Prefect base image, which tells me it's properly connected to the cluster's compute • If I do specify the image, my flows stay in
Pending
state indefinitely. Maybe I'm not waiting long enough, as I've only waited 10 minutes before cancelling
n
hmm I'd be curious to see agent logs, sounds like it could potentially be a permissions thing with pulling the image from your registry?
t
Seems to be picking up the flow runs fine, haven't seen any indication of errors with pulling down the image. I imagine that issue would come up prior to the flows being picked up, right?
It's failing due to a
BackoffLimitExceeded
error
c
How did you define the deployment? BackoffLimit says it seems like the pods are in fact starting , and failing / dying. You should have logs for the failed pods indicating why they failed
t
Thanks for all the help! I think it was an issue with my Dockerfile. I managed to get it up and running by removing the entrypoint from the Dockerfile
👍 1