I'm running into an issue where "ghost" tasks/flow...
# prefect-community
a
I'm running into an issue where "ghost" tasks/flows are taking up my concurrency limits. I'm using prefect cloud 1.0 and using either kubernetes or local agents for my flows. Some issues I'm running into: • Some of my task concurrency limits have been used up for many weeks in a row, even when I have no running flows at all - and especially not any that have that task tag. I have tried the following but the slots still seem to be used up: ◦ I have stopped all runs in progress using the UI ◦ For my flows deployed on k8s, I have removed all jobs that have been running for >x days (these were "ghost" jobs too as there was no flow that was running that long) ◦ I have restarted my agent • Some cancelled flows are still showing up in the Running tab. They're greyed out and have the "cancelling..." written below them for quite a few days. My question: • How can I identify why my task concurrency slots are being used and how can I clean them up?
1
These issues affect both local flows and kubernetes flows. They also impact both flow concurrency limits as well as task ones.
a
a
Am I missing something? I can see how many slots are used but not which flows are using them
a
I see what you mean, I'm not aware of any way to check that other than querying the GraphQL API (not sure how though) fwiw, this is much easier to troubleshoot in 2.0 thanks to work queues
a
Has a similar issue with "ghost" flows been seen before ?
a
I don't see ghosts just yet 😄
😬 1
a
Hello @Anna Geller thank you for your help so far, just wanted to follow up on my investigation. I ran this query in the hopes that I could identify why my task concurrency limits were being used up when the cloud UI was reporting that no flows were running.
Copy code
query get_running_flows {
  flow_run(where: {state: {_eq: "Running"}}) {
      state
      created
    	flow {
        id
      }
  }
}
I received this error
Copy code
{
  "errors": [
    {
      "path": [
        "flow_run",
        0,
        "created"
      ],
      "message": "Cannot return null for non-nullable field flow_run.created.",
      "extensions": {
        "code": "INTERNAL_SERVER_ERROR"
      }
    }
  ],
  "data": null
}
So it looks like the associated flow is no longer available. I tried deleting an older flow runs using
Copy code
mutation delete_flow_run {
  deleteFlowRun(input: {flowRunId: 
    "87b8fa4f-a206-438f-95b2-364ea4187338"}) {
    success,
    error
  }
}
but I get this response
Copy code
{
  "data": {
    "deleteFlowRun": {
      "success": false,
      "error": null
    }
  }
}