h

    Hui Zheng

    2 years ago
    Hello Prefect support, I need some help. One our prefect flow [1] has difficulty initiating flow-runs since 3 hours ago. the flow-runs get hold up at the state of
    Scheduled to start
    , but there is not any run logs and no gke job was created. All runs scheduled for the past 3 hours had the same issue. If we manually start a flow-run, it also run into the same issue. The project agent shows in good status. Those delayed runs won’t even show up in the flow runs tab. It seems to me that the flow scheduled the run, but somehow was not able to start the run when the time comes. [1] https://cloud.prefect.io/semios/flow/dcf941bd-6365-45e8-aafd-71daae6c29f0?version=7
    nicholas

    nicholas

    2 years ago
    Hi @Hui Zheng - what are the labels that you see on that flow's details tile (top most left tile on the flow page)
    h

    Hui Zheng

    2 years ago
    nicholas

    nicholas

    2 years ago
    Thanks @Hui Zheng - it looks like you've got a label mismatch, perhaps those were updated on either the agent or the flow at some point? Try adding the
    20.06.0
    label to your Agent 🙂
    h

    Hui Zheng

    2 years ago
    the label is
    20.06.0
    let me try
    @nicholas actually we just realized that one our agent is missing from the prefect-cloud panel
    we had a agent with label
    20.06.0
    , which is still running in gke cluster, and looks fine
    - name: PREFECT__CLOUD__AGENT__LABELS
              value: '[''20.06.0'']'
    however, this
    20.06.0
    agent disappeared on prefect-cloud dashboard. Right now we only see one other agent,.
    nicholas

    nicholas

    2 years ago
    Interesting - I'm not sure what would cause that but my recommendation would be to halt the GKE Agent you have running right now, upgrade to the latest version of Prefect, and restart it - it looks like the agent you have running is on a very old version of Prefect.
    h

    Hui Zheng

    2 years ago
    we just look into container log of that
    20.06.0
    agent. It is having a lot of errors like this
    {
      "textPayload": "[2020-09-16 23:10:47,877] ERROR - agent | [{'path': ['get_runs_in_queue'], 'message': '[{\\'extensions\\': {\\'path\\': \\'$\\', \\'code\\': \\'data-exception\\'}, \\'message\\': \\'invalid input syntax for type uuid: \"b1e73cd5-8a84-4c6a-a7ac-69000bd8e827:KubernetesAgent:agent:20.06.0\"\\'}]', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]\n",
      "insertId": "l3nsmegqj3k5xkcd0",
      "resource": {
        "type": "k8s_container",
        "labels": {
          "namespace_name": "default",
          "location": "us-west2-a",
          "project_id": "semios-data-platform",
          "pod_name": "prefect-agent-7c8d87df48-ttjpb",
          "cluster_name": "scheduler-20-06-0",
          "container_name": "agent"
        }
      },
      "timestamp": "2020-09-16T23:10:47.878010266Z",
      "severity": "INFO",
      "labels": {
        "k8s-pod/pod-template-hash": "7c8d87df48",
        "k8s-pod/app": "prefect-agent"
      },
      "logName": "projects/semios-data-platform/logs/stdout",
      "receiveTimestamp": "2020-09-16T23:10:54.523562262Z"
    }
    it seems there is a communication error with prefect-cloud server?
    you meant to halt the GKE Agent, and re-deploy it with latest prefect version?
    nicholas

    nicholas

    2 years ago
    Ah ok, let me investigate that with our Cloud team but in the meantime I think upgrading your Agent will resolve the issue (and provide a lot of upcoming improvements)
    Exactly @Hui Zheng
    h

    Hui Zheng

    2 years ago
    our prefect flow might still use an old version of Prefect. will upgrading the agent makes it incompatible with the prefect flow?
    nicholas

    nicholas

    2 years ago
    Nope, it should have no impact
    (We run flows internally on extremely old versions of Core that are always picked up by new agents)
    h

    Hui Zheng

    2 years ago
    thank you. will try that