Hello Prefect support, I need some help. One our p...
# prefect-community
h
Hello Prefect support, I need some help. One our prefect flow [1] has difficulty initiating flow-runs since 3 hours ago. the flow-runs get hold up at the state of
Scheduled to start
, but there is not any run logs and no gke job was created. All runs scheduled for the past 3 hours had the same issue. If we manually start a flow-run, it also run into the same issue. The project agent shows in good status. Those delayed runs won’t even show up in the flow runs tab. It seems to me that the flow scheduled the run, but somehow was not able to start the run when the time comes. [1] https://cloud.prefect.io/semios/flow/dcf941bd-6365-45e8-aafd-71daae6c29f0?version=7
n
Hi @Hui Zheng - what are the labels that you see on that flow's details tile (top most left tile on the flow page)
h
n
Thanks @Hui Zheng - it looks like you've got a label mismatch, perhaps those were updated on either the agent or the flow at some point? Try adding the
20.06.0
label to your Agent 🙂
h
the label is
20.06.0
let me try
@nicholas actually we just realized that one our agent is missing from the prefect-cloud panel
we had a agent with label
20.06.0
, which is still running in gke cluster, and looks fine
Copy code
- name: PREFECT__CLOUD__AGENT__LABELS
          value: '[''20.06.0'']'
however, this
20.06.0
agent disappeared on prefect-cloud dashboard. Right now we only see one other agent,.
n
Interesting - I'm not sure what would cause that but my recommendation would be to halt the GKE Agent you have running right now, upgrade to the latest version of Prefect, and restart it - it looks like the agent you have running is on a very old version of Prefect.
h
we just look into container log of that
20.06.0
agent. It is having a lot of errors like this
Copy code
{
  "textPayload": "[2020-09-16 23:10:47,877] ERROR - agent | [{'path': ['get_runs_in_queue'], 'message': '[{\\'extensions\\': {\\'path\\': \\'$\\', \\'code\\': \\'data-exception\\'}, \\'message\\': \\'invalid input syntax for type uuid: \"b1e73cd5-8a84-4c6a-a7ac-69000bd8e827:KubernetesAgent:agent:20.06.0\"\\'}]', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]\n",
  "insertId": "l3nsmegqj3k5xkcd0",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "namespace_name": "default",
      "location": "us-west2-a",
      "project_id": "semios-data-platform",
      "pod_name": "prefect-agent-7c8d87df48-ttjpb",
      "cluster_name": "scheduler-20-06-0",
      "container_name": "agent"
    }
  },
  "timestamp": "2020-09-16T23:10:47.878010266Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/pod-template-hash": "7c8d87df48",
    "k8s-pod/app": "prefect-agent"
  },
  "logName": "projects/semios-data-platform/logs/stdout",
  "receiveTimestamp": "2020-09-16T23:10:54.523562262Z"
}
it seems there is a communication error with prefect-cloud server?
you meant to halt the GKE Agent, and re-deploy it with latest prefect version?
n
Ah ok, let me investigate that with our Cloud team but in the meantime I think upgrading your Agent will resolve the issue (and provide a lot of upcoming improvements)
Exactly @Hui Zheng
h
our prefect flow might still use an old version of Prefect. will upgrading the agent makes it incompatible with the prefect flow?
n
Nope, it should have no impact
(We run flows internally on extremely old versions of Core that are always picked up by new agents)
h
thank you. will try that