Hi everyone, I’ve been running into an issue today...
# prefect-community
j
Hi everyone, I’ve been running into an issue today
AttributeError: 'V1Job' object has no attribute 'name'
and I’m not sure what this means. Prefect cloud is reporting that
No heartbeat detected from the remote task; marking the run as failed.
This is a Kubernetes Agent. I looked at the logs and here is a stack trace:
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 413, in heartbeat
    self.manage_jobs()
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 193, in manage_jobs
ERROR:agent:Error while managing existing k8s jobs
Traceback (most recent call last):
    f"Job {job.name!r} is for flow run {flow_run_id!r} "
AttributeError: 'V1Job' object has no attribute 'name'
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 190, in manage_jobs
    flow_run_state = self.client.get_flow_run_state(flow_run_id)
  File "/usr/local/lib/python3.6/site-packages/prefect/client/client.py", line 1664, in get_flow_run_state
    raise ObjectNotFoundError(f"Flow run {flow_run_id!r} not found.")
prefect.exceptions.ObjectNotFoundError: Flow run 'af8b8a74-8ed0-4417-812a-566de859ce64' not found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 413, in heartbeat
    self.manage_jobs()
  File "/usr/local/lib/python3.6/site-packages/prefect/agent/kubernetes/agent.py", line 193, in manage_jobs
    f"Job {job.name!r} is for flow run {flow_run_id!r} "
AttributeError: 'V1Job' object has no attribute 'name'
Deleting the Agent pod did not solve the issue. Any ideas?
k
The heartbeat seems to just suggest we lost communication with your task because something happened like it ran out of memory or crashed. Since we don’t hear from the task, we just mark it as failed, otherwise it would show as running forever. This looks like there is an issue with the job spinning up? Is this happening during a flow run start? Does the flow interact with Kubernetes in anyway? Is this consistent or intermittent>?
j
Flows are not running anymore, they stay stuck in a scheduled state. It happened suddenly and all the subsequent flow runs failed. I’m not sure what the root cause is. The flows should not affect any kubernetes processes.
k
Does a hello world flow work? I am wondering if there is something with your job definition?
j
I can’t get any flows to run anymore; I think there is something wrong with the agent.
It appears that the agent is looking for a run that does not exist?
Copy code
File "/usr/local/lib/python3.6/site-packages/prefect/client/client.py", line 1664, in get_flow_run_state
raise ObjectNotFoundError(f"Flow run {flow_run_id!r} not found.")
prefect.exceptions.ObjectNotFoundError: Flow run 'ed00b008-7224-4386-bea3-707684420326' not found.
k
I think Prefect 1 is hard to find the pod for a given flow, but maybe you can try deleting that pod or cancelling that flow that it’s looking for?
a
can you send your flow and Kubernetes job template definition? it looks like an issue with Secrets. This user has a similar issue, check if the solution at the bottom can help you https://discourse.prefect.io/t/issues-using-gitlab-storage-with-kubernetesagent-and-pre[…]r-404-project-not-found-or-file-or-directory-not-found/644
j
I’m going to try deleting all the old prefect jobs. I highly doubt it has anything to do with secrets as the flows ran just fine yesterday with no change in between. I will update here!
a
awesome, keep us posted!
j
Can someone explain to me what this error means?
prefect.exceptions.ObjectNotFoundError: Flow run '0758ed94-ebda-44c4-a439-e698af7cb675' not found.
It’s being emitted by the agent on our cluster. I’m not sure if there is something wrong on the cloud side. This is the flow run id for a job that was completed about 8 days ago. Searching for this ID in prefect cloud has no results.
Why is the agent looking for jobs that have happened so long ago?
a
what Prefect version do you use? this error should be fixed with this PR https://github.com/PrefectHQ/prefect/pull/5577
j
Ah that would explain it. We are on version
0.15.4
. Thanks!
👍 1