Hello I have problems with prefect cloud 2 0 We use kubernet Prefect Community #ask-community

Hello, I have problems with prefect cloud 2.0. We...

Emil Østergaard

07/12/2022, 10:01 AM

Hello, I have problems with prefect cloud 2.0. We use kubernetes flow runner, and a dask task runner. Friday (8/7-2022), I had a flow run which I wanted to abort. I attempted to use the

delete

functionality in the UI, thinking it would delete all resources related to the flow_run, including the kubernetes job etc. It did not remove the kubernetes job, so I removed this manually. The issue is concurrency-limits: The tasks launched by this flow has a tag, with a concurrency limit. It appears the task data associated with the deleted flow run was not removed from prefect storage. For instance, if I try:

Copy code

prefect concurrency-limit inspect my-tag

It shows a bunch of active task ids, even though nothing is running in k8s. This causes an unfortunate issue where any new flow runs, for this flow, will never start tasks, because prefect thinks the concurrency-limit is hit, due to these zombie tasks. However, I can not seem to find a way to manually clean up these task ids, which means this flow is dead. Any help is appreciated!

✅ 1

Anna Geller

07/12/2022, 12:01 PM

Deleting a flow run will delete only the flow run, it will not terminate any external resources Due to a hybrid model, Prefect doesn't have direct access to your infra, which is why terminating resources this way is difficult Let me open an issue to investigating the best approach for such zombie tasks @Marvin open "Investigate the right approach for cleaning up zombie task runs caused by an infrastructure crash to free up concurrency limit slots"

👍 1

Marvin

07/12/2022, 12:02 PM

https://github.com/PrefectHQ/prefect/issues/5995

Emil Østergaard

07/12/2022, 12:10 PM

Thank you Anna. I think it would be sensible for

delete flow-run

to delete all related resources on the prefect-storage side. Such as any task runs associated with the flow-run etc. Assuming this is not the case at the moment. Regarding external resources. We have our agent deployed in a k8s cluster, and the agent has access to the k8s api. Would it not be possible to forward information from the agent, to the prefect-storage, and thus have it reflected in the UI? We often have problems with the information in the UI being out of sync with the actual state in k8s. Such as flows which look like they "run-forever" even if the k8s pod is long gone.

Anna Geller

07/12/2022, 12:13 PM

Would you want to open a separate GitHub issue for that and explain there what is exactly happening that is out of sync between Kubernetes and Prefect? This is a separate issue than cleaning up zombie task runs, even if it's related to each other

Emil Østergaard

07/12/2022, 12:14 PM

Yes sure, I will do that

Anna Geller

07/12/2022, 12:15 PM

flows which look like they "run-forever" even if the k8s pod is long gone

as mentioned before, handling infrastructure crashes is a hard problem in a hybrid model and this is already on our radar. But if you mean something else, then creating a separate issue might be useful

4 Views

Open in Slack

Previous Next