Hi we re using prefect cloud but since yesterday 4 45 PM EST Prefect Community #prefect-cloud

Hi, we're using prefect cloud but since yesterday ...

Frost Ouyang

08/04/2024, 8:46 PM

Hi, we're using prefect cloud but since yesterday 4:45 PM EST we started to encounter errors below:

Copy code

23:14:54.824 | INFO    | prefect.flow_runs.worker - Completed submission of flow run '5821b7db-00cc-4633-a01e-5af5bc1eacac'
23:14:54.997 | INFO    | prefect.flow_runs.worker - Reported flow run '5821b7db-00cc-4633-a01e-5af5bc1eacac' as crashed: Flow run could not be submitted to infrastructure
23:14:56.989 | ERROR   | GlobalEventLoopThread | prefect._internal.concurrency - Service 'EventsWorker' failed with 1 pending items.

we don't have code change around that time. Anyone can help take a look? Thanks.

Frost Ouyang

08/04/2024, 8:47 PM

Here is detailed error of another flow run crash:

Copy code

20:44:51.527 | INFO    | prefect.flow_runs.worker - Worker 'KubernetesWorker e6e6fbe3-fa0e-4e5b-bfe7-f005d58791ee' submitting flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca'
20:44:51.528 | INFO    | prefect.flow_runs.worker - Worker 'KubernetesWorker e6e6fbe3-fa0e-4e5b-bfe7-f005d58791ee' submitting flow run '86be8073-3d5c-475c-ab6c-b43e3953c047'
20:44:52.041 | ERROR   | prefect.flow_runs.worker - Failed to submit flow run '86be8073-3d5c-475c-ab6c-b43e3953c047' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 612, in run
    async with self._get_configured_kubernetes_client(configuration) as client:
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client
    await config.load_incluster_config()
TypeError: object NoneType can't be used in 'await' expression
20:44:52.042 | INFO    | prefect.flow_runs.worker - Completed submission of flow run '86be8073-3d5c-475c-ab6c-b43e3953c047'
20:44:52.048 | ERROR   | prefect.flow_runs.worker - Failed to submit flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 612, in run
    async with self._get_configured_kubernetes_client(configuration) as client:
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client
    await config.load_incluster_config()
TypeError: object NoneType can't be used in 'await' expression
20:44:52.049 | INFO    | prefect.flow_runs.worker - Completed submission of flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca'
20:44:52.203 | INFO    | prefect.flow_runs.worker - Reported flow run '86be8073-3d5c-475c-ab6c-b43e3953c047' as crashed: Flow run could not be submitted to infrastructure
20:44:52.227 | INFO    | prefect.flow_runs.worker - Reported flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca' as crashed: Flow run could not be submitted to infrastructure
20:44:54.325 | ERROR   | GlobalEventLoopThread | prefect._internal.concurrency - Service 'EventsWorker' failed with 2 pending items.

Nate

08/04/2024, 9:00 PM

hi @Frost Ouyang

```File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client

await config.load_incluster_config()```

this is a known bug in

prefect-kubernetes==0.4.0

so you must have recently upgraded to this version of the worker we're planning on releasing the fix tomorrow but in the meantime you may want to downgrade your worker's version of

prefect-kubernetes

to 0.3.11

Frost Ouyang

08/04/2024, 9:03 PM

oh great. Thank you.

👍 1

Frost Ouyang

08/04/2024, 9:06 PM

Hi, I'm wondering how to downgrade the worker's version? I just checked our projects and found that it's still

prefect-kubernetes==0.3.10

in our environment.

Nate

08/04/2024, 9:07 PM

how are you running the kubernetes worker? are you using the helm chart?

Frost Ouyang

08/04/2024, 9:08 PM

yes, helm chart. I checked and found information below:

Copy code

<http://helm.sh/chart=prefect-worker-2023.08.24|helm.sh/chart=prefect-worker-2023.08.24>

Nate

08/04/2024, 9:09 PM

https://github.com/PrefectHQ/prefect-helm/blob/main/charts/prefect-worker/values.yaml#L56-L62

Nate

08/04/2024, 9:09 PM

you'd set a different image in your

values.yml

file

Frost Ouyang

08/04/2024, 9:11 PM

I just checked and found below in our `values.yaml`:

Copy code

prefectTag: 2-python3.11-kubernetes

Frost Ouyang

08/04/2024, 9:12 PM

Copy code

kubectl -n prefect2 describe deployment prefect-worker             (frost-engineering-staging.k8s.local/default)
Name:                   prefect-worker
Namespace:              prefect2
CreationTimestamp:      Fri, 27 Oct 2023 21:44:30 -0400
Labels:                 <http://app.kubernetes.io/component=worker|app.kubernetes.io/component=worker>
                        <http://app.kubernetes.io/instance=prefect-worker|app.kubernetes.io/instance=prefect-worker>
                        <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                        <http://app.kubernetes.io/name=prefect-worker|app.kubernetes.io/name=prefect-worker>
                        <http://helm.sh/chart=prefect-worker-2023.08.24|helm.sh/chart=prefect-worker-2023.08.24>
                        prefect-version=2.11.5

Frost Ouyang

08/04/2024, 9:13 PM

Copy code

Service Account:  prefect-worker
  Containers:
   prefect-worker:
    Image:      prefecthq/prefect:2-python3.11-kubernetes
    Port:       <none>
    Host Port:  <none>

we didn't change this config for a long time. yesterday after we got the error, I rollout restart the deployment but still didn't change any configs.

Nate

08/04/2024, 9:24 PM

right, so

2-python3.11-kubernetes

is a dynamic tag (whatever the latest 2.x version is) i would use

2.19.9-python3.11-kubernetes

so that you will not get new versions when a new 2.x version gets released, so you can make explicit upgrades when you're ready

Copy code

» docker run -it prefecthq/prefect:2.19.9-python3.12-kubernetes bash
2.19.9-python3.12-kubernetes: Pulling from prefecthq/prefect
...
bde9cf9b3ccf: Pull complete
Digest: sha256:e387b8f0f2b61c41ff3a86dc876a9294f5ffd0278d55324813e83893deb25583
Status: Downloaded newer image for prefecthq/prefect:2.19.9-python3.12-kubernetes

root@3ccff8b468b4:/opt/prefect# pip list | grep prefect
prefect                   2.19.9
prefect-kubernetes        0.3.11

Frost Ouyang

08/04/2024, 9:26 PM

so I should change that tag in my prefect

value.yaml

and re-deployed the helm chart?

Nate

08/04/2024, 9:26 PM

yes - that should solve your error

Frost Ouyang

08/04/2024, 9:27 PM

I do have another question - we have staging and production environment which use the exactly the same value.yaml. However, from yesterday, all flow runs in staging environment crashed, but nothing happened in our production environment. Is there a reason for that?

Frost Ouyang

08/04/2024, 9:39 PM

I re-deployed the worker in our staging environment and a new flow run succeeded. Thanks for the support. I still need to figure out when/how our staging worker get upgraded - it shouldn't happen automatically, right?

👍 1

Nate

08/04/2024, 10:05 PM

if the worker died somehow and kubernetes restarted it, then it would pull whatever image was most recently associated with that tag but somehow the process must have restarted

Frost Ouyang

08/04/2024, 11:12 PM

I see. Thanks for the assistance.

Nate

08/04/2024, 11:13 PM

sure thing!

35 Views

Open in Slack

Previous Next