<@U01QEJ9PP53> another different question from me ...
# ask-community
s
@Kevin Kho another different question from me - but is this a known bug in prefect? My dashboard commonly shows inaccurate numbers. 12 failed flows in the past 24 hours, but also 50.3% of 559 failed, which is >>12. The confusing part is we have slack notifications turned on for all flows, and we’ve received only 2 notifications in the last 24 hours, and Im not sure who to trust. 2 failures, 12 failures, or 281 failures.
j
Hi @Samuel Hinton - Thanks for the question. For the dashboard: The failed flows tile shows flows that have failed. Each flow may well have many runs that failed. So the dashboard shows you have 12 failed flows and 281 failed flow runs. (You can click into the failed flows to see which and how many runs failed if needed.) The 2 slack notifications sounds like you are potentially missing some failed flows in your notifications - how do you have those set up? (Cloud hook? Slack notifier? SlackTask?)
s
Hi Jenny. To clarify, if I have a Flow “FlowA” and its scheduled to run every hour and it fails for the entire day, I would have 24 Flow run failures on the left, and 1 failed flow on the right? Or is it only counting the reruns/retries? Re the notifications we have this set up for all flows
handler = slack_notifier(_only_states_=[Failed])
Of those, only two flow runs for our
M7 Polling
flow were received. (I note that prior days have notifications for all our different flows). Maybe its related to the memory issues I was discussing with Kevin, and the notifications failed silently
j
Thanks @Samuel Hinton - Yes "Flow A" would show in your failed flow tile and the 24 run failures would be counted in your flow run total. The notifier issue sounds like something didn't work - let me check your conversation with Kevin...
s
Ah okay, thats good to know, apologies for misunderstanding the dashboard. A TLDR is that both my local-agent and dask-workers were running out of memory due to accumulating results that I wasnt aware of. Until I get a bucket setup, Ive just turned off checkpointing and will monitor those services
j
Ah - do you know if those failed flow runs actually ran? If they failed before they could get into a running state the state handler wouldn't have been fired.
s
Yeah that might be it, I can see osmething about a Lazarus process so that makes sense - if it never kicked off properly then the notifier wouldnt have triggered
j
A Cloud Hook (or automation if using Cloud) would catch those for you if you need alerts for those in the future.
s
Ah, Cloud hooks look good. We have quite a number of flows, is it possible to configure the cloud hooks the same way as the state handler (by defining something attached to the flow using the python API?)
j
Good question! I haven't done this but I think you could set one using the
create_cloud_hook
mutation and possibly setting the version_group_id for your flow or querying for it. There's an old issue here with comments from one of our users about how they set this up: https://github.com/PrefectHQ/prefect/issues/2457
s
Hi Jenny, I agree that in the register isnt a good place as per the discussion, but its a shame there isnt another python API. From the comments it looks like we’d need at least graphql queries to set this up, and I have no experience with those. Is it possible to add a request for example queries added to the documentation. For now Ill go through and make them all manually as thatll be faster than trying to figure out graphql structure 🙂
j
Great suggestion on updating the docs there. I think there's one example but it's pretty basic: https://docs.prefect.io/orchestration/concepts/api.html#getting-started @Marvin open "Add more python client query examples to the docs"
🙏 1