Hello everyone our team has just run into an issue where all Prefect Community #prefect-cloud

Hello everyone, our team has just run into an issu...

Curtis White

07/31/2024, 5:25 PM

Hello everyone, our team has just run into an issue where all of our clusters (dev and production) are no longer found by Prefect cloud. It seems that the workers cannot start. This happened to all the environments with out any changes in deployment at about 9am PST today. Does anyone have any thoughts about why this may have happened considering we are pinned on an older version of Prefect

2.16.4

Alexander Azzam

07/31/2024, 5:31 PM

Is prefect_kubernetes pinned too?

Curtis White

07/31/2024, 5:34 PM

no we have not pinned the version of that package

Alexander Azzam

07/31/2024, 5:43 PM

So my guess is that since it’s not pinned, it’s pulling the most up to date version of prefect-kubernetes which may be referencing a utility in a version of prefect you don’t have. Not entirely sure (on vacation at the moment but had a second to spare). @Nate is this your spidey sense too?

Nate

07/31/2024, 5:52 PM

yes i think you're right @Alexander Azzam - 2.16.4 doesn't have

prefect.utilities.timeout

but the newest kubernetes worker uses that module

Curtis White

07/31/2024, 5:54 PM

We are actually using the provided image

prefecthq/prefect:2.16.4-python3.10

so maybe the version is not pinned in that image? Either way, I just upgraded to the newest version and it seems to work alright now

Nate

07/31/2024, 5:56 PM

We are actually using the provided image
prefecthq/prefect:2.16.4-python3.10
so maybe the version is not pinned in that image?

prefect 2.16.4 just doesnt have that module, so when installing

prefect-kubernetes>=0.4.0

on top (which uses that module) then we get that import error but yeah, upgrading prefect or downgrading prefect-kubernetes would resolve - glad you got it figured out!

Curtis White

07/31/2024, 5:56 PM

Thanks everyone!

Nate

07/31/2024, 5:56 PM

catjam

Curtis White

07/31/2024, 6:05 PM

Actually, spoke to soon. The worker was reporting to Prefect Cloud but on initialization of a Prefect run I get different errors:

Copy code

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
  File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 612, in run
    async with self._get_configured_kubernetes_client(configuration) as client:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client
    await config.load_incluster_config()
TypeError: object NoneType can't be used in 'await' expression

Nate

07/31/2024, 6:35 PM

hi @Curtis White - sorry about that. that is a bug that went out that we've fixed in the 2.x lineage but need to release (will be doing this asap) in the meantime 0.3.11 should not have this bug

Curtis White

07/31/2024, 6:36 PM

so do we need to create our own image then for our worker?

Curtis White

07/31/2024, 6:36 PM

We have something like this right now

Copy code

spec:
      serviceAccountName: scheduler
      containers:
        - name: worker
          image: prefecthq/prefect:2.19.9-python3.11
          command:
            [
              "prefect",
              "worker",
              "start",
              "--pool",
              "${ENVIRONMENT_NAME_H}",
              "--type",
              "kubernetes",
              "--install-policy",
              "always",
            ]
          imagePullPolicy: "Always"
          securityContext:
            allowPrivilegeEscalation: false

Nate

07/31/2024, 6:37 PM

there's a kubernetes flavored image you should be able to use, one sec, let me grab that

Nate

07/31/2024, 6:39 PM

Copy code

» docker run -it --rm prefecthq/prefect:2.19.9-python3.11-kubernetes bash

root@53647e34beb2:/opt/prefect# pip list | grep prefect
prefect                   2.19.9
prefect-kubernetes        0.3.11

Curtis White

07/31/2024, 6:48 PM

For this image it seems to also download the newer version

Curtis White

07/31/2024, 6:49 PM

This is with this config

Nate

07/31/2024, 6:50 PM

hmm it shouldnt, as I tried to show in the above, 0.3.11 is already installed on that image if you have

EXTRA_PIP_PACKAGES

on the deployment / work pool or are pip installing something in the

pull

section, then yeah it would install prefect-kubernetes on top at runtime

Nate

07/31/2024, 6:52 PM

yeah actually in the top of that screengrab you sent, you can see it says 0.3.11 is already in the site packages. so id guess you have EXTRA_PIP_PACKAGES or its the install-policy "always"

Nate

07/31/2024, 6:54 PM

i might choose

if-not-present

instead of

always

Curtis White

07/31/2024, 6:55 PM

We don’t have anything setup for

EXTRA_PIP_PACKAGES

but I will try with this different policy

👍 1

Curtis White

07/31/2024, 6:58 PM

same issue when changing this to if-not-present

Nate

07/31/2024, 6:58 PM

as in, you see 0.4.0 being installed on top?

Curtis White

07/31/2024, 6:59 PM

Yes specifically it installs 0.4.0 and then uninstalls the 0.3.11 version

Nate

07/31/2024, 7:00 PM

well if you don't mind trying one more thing, you shouldnt need an install policy at all if we use the kubernetes flavored image. can we try just removing that install policy flag entirely?

Curtis White

07/31/2024, 7:06 PM

Ok its working now. It was because I changed the

imagePullPolicy: "Always"

not the install policy in the worker start. I have a flow run executing at least

Nate

07/31/2024, 7:07 PM

ah nice catch, didn't think of that

19 Views

Open in Slack

Previous Next