Hello everyone! The Cloud API issue should now be...
# prefect-community
c
Hello everyone! The Cloud API issue should now be resolved, and we took a few extra steps that should prevent it from recurring; I don't have a full post mortem yet but in short this appears to have been caused by an incredibly large burst of logs entering the system that caused cascading effects. We'll be sure to take our learnings here to prevent this from occurring again and my sincere apologies for your disruption in service!! I hope everyone continues to have a good morning, afternoon or evening depending on where you are
🎉 2
🙏 11
m
Thank you for being super prompt about resolving this issue. Unfortunately while our impacted flows correctly show a “heartbeat failure detected” in the logs - they still got stuck in a “Running” state … Not sure if much can be done about this but thought I would also flag this
I believe it has to do with the fact that our failed task runs were mapped child tasks - i.e. there were other mapped tasks that were in a pending state and got stuck in that state preventing the flow run from failing
c
I appreciate the understanding Marwan! Do you happen to know if you disabled Lazarus? I would have expected Lazarus to pick this up after a while and reschedule your flow to clean up
m
Yes, Unfortunately we have lazarus disabled due to a bug we encountered using it … where lazarus would reschedule successful flow runs on very rare occassions