We've been having some intermittent communication issues between Azure Kubernetes and Prefect Cloud. Causes of this include one of these 3 things happening:
* Prefect Cloud responding with a 500 error
* How do we get more details to help solve this?
* AKS Worker stops picking up Prefect Cloud scheduled flows.
* The only solution to this that has worked is to manually restart the worker.
* Prefect Cloud never recognizes that a pod/flow has completed and continues to show as Running forever until manually cancelled.
* It seems like there is a
timeout_seconds
variable for flows that I could try in this case.
Any help would be appreciated.
Brett
11/02/2023, 2:03 PM
The AKS worker not picking up Prefect Cloud scheduled flows is represented by this issue: https://github.com/PrefectHQ/prefect/issues/7442
TL;DR: The resolution is set this environment variable to false
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.