Hello Community! Our Prefect Worker running in Kub...
# ask-community
i
Hello Community! Our Prefect Worker running in Kubernetes is never reporting a Job as Complete, and instead stops outputting logs at 'Pod has status Running'. This is leaving our Flow in a state of Pending forever, even though it has completed its work.
Copy code
INFO
Worker 'KubernetesWorker 9ccb0c0e-5421-47f2-9b83-77fbb40e032e' submitting flow run '9c11f432-d7c9-4929-95bc-2b21c336b1f6'
03:03:38 PM
prefect.flow_runs.worker

INFO
Creating Kubernetes job...
03:03:39 PM
prefect.flow_runs.worker

DEBUG
Job 'platinum-lori-pq7qp': Monitoring job...
03:03:39 PM
prefect.flow_runs.worker

DEBUG
Job 'platinum-lori-pq7qp': Starting watch for pod start...
03:03:39 PM
prefect.flow_runs.worker

INFO
Completed submission of flow run '9c11f432-d7c9-4929-95bc-2b21c336b1f6'
03:03:39 PM
prefect.flow_runs.worker

INFO
Job 'platinum-lori-pq7qp': Pod has status 'Pending'.
03:03:39 PM
prefect.flow_runs.worker

INFO
Job 'platinum-lori-pq7qp': Pod has status 'Running'.
03:03:40 PM
prefect.flow_runs.worker
Executing the below:
Copy code
$ kubectl describe job <job name>
returns the details of the Job as being completed. In addition, there is evidence in other systems that the Flow interacts with that it did in fact complete all of its work. The Prefect Worker in Kubernetes is running the latest version of Prefect included in the Helm chart dated December 14, 2023.
Copy code
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 2.14.11
<http://helm.sh/chart|helm.sh/chart>: prefect-worker-2023.12.14
We've turned on debug via the Helm chart to see what is going on after the last log message is output by the Worker, but nothing is jumping out at us, except exceptions like these, which appear to occur with a regular frequency both before and after the last log output:
Copy code
20:03:42.813 | DEBUG   | APILogWorkerThread | prefect._internal.concurrency - Encountered exception in call get(<dropped>)
Traceback (most recent call last):
 File \"/usr/local/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py\", line 318, in _run_sync
   result = self.fn(*self.args, **self.kwargs)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/local/lib/python3.11/queue.py\", line 179, in get
  raise Empty
  _queue.Empty
Any ideas as to what to try next? Has anyone else run into this issue?
upvote 1
b
I am exploring the use of workers for the first time, and I am consistently running into this exact issue.
i
Until this gets resolved, I'm just logging the happy path with a separate periodic Flow reporting daily on the success or failure of the various "Pending" Flows based on the information in these logs. A bit of a hit in observability, but it works for now.