Hi, we're using prefect cloud but since yesterday ...
# prefect-cloud
f
Hi, we're using prefect cloud but since yesterday 4:45 PM EST we started to encounter errors below:
Copy code
23:14:54.824 | INFO    | prefect.flow_runs.worker - Completed submission of flow run '5821b7db-00cc-4633-a01e-5af5bc1eacac'
23:14:54.997 | INFO    | prefect.flow_runs.worker - Reported flow run '5821b7db-00cc-4633-a01e-5af5bc1eacac' as crashed: Flow run could not be submitted to infrastructure
23:14:56.989 | ERROR   | GlobalEventLoopThread | prefect._internal.concurrency - Service 'EventsWorker' failed with 1 pending items.
we don't have code change around that time. Anyone can help take a look? Thanks.
Here is detailed error of another flow run crash:
Copy code
20:44:51.527 | INFO    | prefect.flow_runs.worker - Worker 'KubernetesWorker e6e6fbe3-fa0e-4e5b-bfe7-f005d58791ee' submitting flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca'
20:44:51.528 | INFO    | prefect.flow_runs.worker - Worker 'KubernetesWorker e6e6fbe3-fa0e-4e5b-bfe7-f005d58791ee' submitting flow run '86be8073-3d5c-475c-ab6c-b43e3953c047'
20:44:52.041 | ERROR   | prefect.flow_runs.worker - Failed to submit flow run '86be8073-3d5c-475c-ab6c-b43e3953c047' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 612, in run
    async with self._get_configured_kubernetes_client(configuration) as client:
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client
    await config.load_incluster_config()
TypeError: object NoneType can't be used in 'await' expression
20:44:52.042 | INFO    | prefect.flow_runs.worker - Completed submission of flow run '86be8073-3d5c-475c-ab6c-b43e3953c047'
20:44:52.048 | ERROR   | prefect.flow_runs.worker - Failed to submit flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/workers/base.py", line 908, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 612, in run
    async with self._get_configured_kubernetes_client(configuration) as client:
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client
    await config.load_incluster_config()
TypeError: object NoneType can't be used in 'await' expression
20:44:52.049 | INFO    | prefect.flow_runs.worker - Completed submission of flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca'
20:44:52.203 | INFO    | prefect.flow_runs.worker - Reported flow run '86be8073-3d5c-475c-ab6c-b43e3953c047' as crashed: Flow run could not be submitted to infrastructure
20:44:52.227 | INFO    | prefect.flow_runs.worker - Reported flow run '5645f40b-fbdb-412a-9a5a-41a434e181ca' as crashed: Flow run could not be submitted to infrastructure
20:44:54.325 | ERROR   | GlobalEventLoopThread | prefect._internal.concurrency - Service 'EventsWorker' failed with 2 pending items.
n
hi @Frost Ouyang
```File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 743, in _get_configured_kubernetes_client
await config.load_incluster_config()```
this is a known bug in
prefect-kubernetes==0.4.0
so you must have recently upgraded to this version of the worker we're planning on releasing the fix tomorrow but in the meantime you may want to downgrade your worker's version of
prefect-kubernetes
to 0.3.11
f
oh great. Thank you.
๐Ÿ‘ 1
Hi, I'm wondering how to downgrade the worker's version? I just checked our projects and found that it's still
prefect-kubernetes==0.3.10
in our environment.
n
how are you running the kubernetes worker? are you using the helm chart?
f
yes, helm chart. I checked and found information below:
Copy code
<http://helm.sh/chart=prefect-worker-2023.08.24|helm.sh/chart=prefect-worker-2023.08.24>
you'd set a different image in your
values.yml
file
f
I just checked and found below in our `values.yaml`:
Copy code
prefectTag: 2-python3.11-kubernetes
Copy code
kubectl -n prefect2 describe deployment prefect-worker             (frost-engineering-staging.k8s.local/default)
Name:                   prefect-worker
Namespace:              prefect2
CreationTimestamp:      Fri, 27 Oct 2023 21:44:30 -0400
Labels:                 <http://app.kubernetes.io/component=worker|app.kubernetes.io/component=worker>
                        <http://app.kubernetes.io/instance=prefect-worker|app.kubernetes.io/instance=prefect-worker>
                        <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                        <http://app.kubernetes.io/name=prefect-worker|app.kubernetes.io/name=prefect-worker>
                        <http://helm.sh/chart=prefect-worker-2023.08.24|helm.sh/chart=prefect-worker-2023.08.24>
                        prefect-version=2.11.5
Copy code
Service Account:  prefect-worker
  Containers:
   prefect-worker:
    Image:      prefecthq/prefect:2-python3.11-kubernetes
    Port:       <none>
    Host Port:  <none>
we didn't change this config for a long time. yesterday after we got the error, I rollout restart the deployment but still didn't change any configs.
n
right, so
2-python3.11-kubernetes
is a dynamic tag (whatever the latest 2.x version is) i would use
2.19.9-python3.11-kubernetes
so that you will not get new versions when a new 2.x version gets released, so you can make explicit upgrades when you're ready
Copy code
ยป docker run -it prefecthq/prefect:2.19.9-python3.12-kubernetes bash
2.19.9-python3.12-kubernetes: Pulling from prefecthq/prefect
...
bde9cf9b3ccf: Pull complete
Digest: sha256:e387b8f0f2b61c41ff3a86dc876a9294f5ffd0278d55324813e83893deb25583
Status: Downloaded newer image for prefecthq/prefect:2.19.9-python3.12-kubernetes

root@3ccff8b468b4:/opt/prefect# pip list | grep prefect
prefect                   2.19.9
prefect-kubernetes        0.3.11
f
so I should change that tag in my prefect
value.yaml
and re-deployed the helm chart?
n
yes - that should solve your error
f
I do have another question - we have staging and production environment which use the exactly the same value.yaml. However, from yesterday, all flow runs in staging environment crashed, but nothing happened in our production environment. Is there a reason for that?
I re-deployed the worker in our staging environment and a new flow run succeeded. Thanks for the support. I still need to figure out when/how our staging worker get upgraded - it shouldn't happen automatically, right?
๐Ÿ‘ 1
n
if the worker died somehow and kubernetes restarted it, then it would pull whatever image was most recently associated with that tag but somehow the process must have restarted
f
I see. Thanks for the assistance.
n
sure thing!