Is anyone else seeing jobs fail to start this morning using the hybrid hosting model? Our jobs are not getting picked up this morning but we have not seen any errors in the logs on our Agents running in our k8s cluster and did not make any changes for our k8s environment or our Agents
Hi Matt, we received your inquiry in the service console as well. Would you mind giving us an example of a flow ID that is not being picked up?
A flow run ID would be sufficient as well.
Hi Bianca. something seems to have cleared it up now I think but we were in a bad state for at least 12 hours. Here was one we saw not get picked up this morning 0d1e49b0-f2b8-42e9-88ef-c318bc7328b8
Hi Matt, it appears that the flow run was cancelled at
, which interrupted/cancelled all running tasks
, and then resulted in a loss of heartbeat which prompted the zombiekiller service to fail the remaining tasks runs
Outside of that, I don't see anything else unusual. To my knowledge, we haven't experienced any recent outages on our end either. We'll take note of this to see if any other factors could have caused this. Please do reach out if it happens again.
Thank you for the triage and follow up. I will let you know if the issue reoccurs.
Here's a reference that may be helpful to rule out a service outage:
