https://prefect.io logo
#prefect-community
Title
# prefect-community
t

Thet Naing

02/28/2023, 2:53 PM
Hi all! My team is considering a switch to Google Kubernetes Engine for our Prefect infrastructure. Does anyone have recipes or examples of deploying flows with GKE clusters in Prefect 2.x? Any help would be greatly appreciated.
c

Christopher Boyd

02/28/2023, 3:03 PM
Using a pipeline or ..? The process is identical across any cloud since you are just building a deployment and applying it. The only change will be to your storage and credentials more than likely. If you use a kubernetes-job for your infrastructure block, and the agent runs in GKE then it will pull down the flow runs appropriately
t

Thet Naing

02/28/2023, 4:10 PM
I have an agent running in GKE and picking up flow runs when using the base Prefect image, but can't seem to get it to work with my custom image hosted in GCP. Was thinking it'd be helpful to see some examples end to end in case I did something wrong in the process
Btw, thanks for your post in Discourse! Super helpful for getting even this far 🙂
c

Christopher Boyd

02/28/2023, 4:22 PM
I have a working pipeline that looks like this:
Copy code
image = "<http://gcr.io/|gcr.io/>" + GCP_PROJECT_ID + "/" + environ['PROJECT_NAME']

k8s_job = KubernetesJob(
    image=image,
    namespace="prefect2",
    # name="healthcheck",
    name=environ['PROJECT_NAME'],
    customizations=customizations,
    env=dict(
        GCP_PROJECT_ID=GCP_PROJECT_ID,
        GCP_RESULTS_BUCKET=GCP_RESULTS_BUCKET,
        PREFECT_VERSION=PREFECT_VERSION,
        PYTHON_VERSION=PYTHON_VERSION
    ),
    labels={"environment": f'{APP_ENVIRONMENT}'.lower()},
    finished_job_ttl=600,
    job_watch_timeout_seconds=600,
    pod_watch_timeout_seconds=600
)

deployment = Deployment(
    name=f"flow-{APP_ENVIRONMENT}",
    flow_name="flow-{APP_ENVIRONMENT}",
    version=1,
    work_queue_name="dev",
    infrastructure=k8s_job,
    path="/opt/prefect/flows",
    parameters=params,
    entrypoint="flow.py:main"
)
the ability to pull from your container registry would be an iam policy permission on the GKE cluster itself though
but as long as you specify the right image, and have the permissions. it’s fairly straightforward (that is, not cloud specific)
❤️ 1
t

Thet Naing

02/28/2023, 4:24 PM
I previously used my docker image for deploying an agent to a GCP Compute Engine VM, so I have an entrypoint at the bottom of the Dockerfile. Do you think that could be causing issues?
c

Christopher Boyd

02/28/2023, 4:25 PM
the flow will override your docker entrypoint
t

Thet Naing

02/28/2023, 4:26 PM
I see
The thing that's tripping me up is: • If I don't specify the image, I can run flows with standard dependencies that are included in the Prefect base image, which tells me it's properly connected to the cluster's compute • If I do specify the image, my flows stay in
Pending
state indefinitely. Maybe I'm not waiting long enough, as I've only waited 10 minutes before cancelling
n

Nate

02/28/2023, 4:36 PM
hmm I'd be curious to see agent logs, sounds like it could potentially be a permissions thing with pulling the image from your registry?
t

Thet Naing

02/28/2023, 4:47 PM
Seems to be picking up the flow runs fine, haven't seen any indication of errors with pulling down the image. I imagine that issue would come up prior to the flows being picked up, right?
It's failing due to a
BackoffLimitExceeded
error
c

Christopher Boyd

02/28/2023, 5:31 PM
How did you define the deployment? BackoffLimit says it seems like the pods are in fact starting , and failing / dying. You should have logs for the failed pods indicating why they failed
t

Thet Naing

02/28/2023, 8:43 PM
Thanks for all the help! I think it was an issue with my Dockerfile. I managed to get it up and running by removing the entrypoint from the Dockerfile
👍 1