< Kevin Kho> another different question from me but is this Prefect Community #ask-community

<@U01QEJ9PP53> another different question from me ...

Samuel Hinton

08/09/2021, 2:30 PM

@Kevin Kho another different question from me - but is this a known bug in prefect? My dashboard commonly shows inaccurate numbers. 12 failed flows in the past 24 hours, but also 50.3% of 559 failed, which is >>12. The confusing part is we have slack notifications turned on for all flows, and we’ve received only 2 notifications in the last 24 hours, and Im not sure who to trust. 2 failures, 12 failures, or 281 failures.

Jenny

08/09/2021, 3:18 PM

Hi @Samuel Hinton - Thanks for the question. For the dashboard: The failed flows tile shows flows that have failed. Each flow may well have many runs that failed. So the dashboard shows you have 12 failed flows and 281 failed flow runs. (You can click into the failed flows to see which and how many runs failed if needed.) The 2 slack notifications sounds like you are potentially missing some failed flows in your notifications - how do you have those set up? (Cloud hook? Slack notifier? SlackTask?)

Samuel Hinton

08/09/2021, 3:30 PM

Hi Jenny. To clarify, if I have a Flow “FlowA” and its scheduled to run every hour and it fails for the entire day, I would have 24 Flow run failures on the left, and 1 failed flow on the right? Or is it only counting the reruns/retries? Re the notifications we have this set up for all flows

handler = slack_notifier(_only_states_=[Failed])

Of those, only two flow runs for our

M7 Polling

flow were received. (I note that prior days have notifications for all our different flows). Maybe its related to the memory issues I was discussing with Kevin, and the notifications failed silently

Jenny

08/09/2021, 3:33 PM

Thanks @Samuel Hinton - Yes "Flow A" would show in your failed flow tile and the 24 run failures would be counted in your flow run total. The notifier issue sounds like something didn't work - let me check your conversation with Kevin...

Samuel Hinton

08/09/2021, 3:35 PM

Ah okay, thats good to know, apologies for misunderstanding the dashboard. A TLDR is that both my local-agent and dask-workers were running out of memory due to accumulating results that I wasnt aware of. Until I get a bucket setup, Ive just turned off checkpointing and will monitor those services

Jenny

08/09/2021, 3:42 PM

Ah - do you know if those failed flow runs actually ran? If they failed before they could get into a running state the state handler wouldn't have been fired.

Samuel Hinton

08/09/2021, 3:44 PM

Yeah that might be it, I can see osmething about a Lazarus process so that makes sense - if it never kicked off properly then the notifier wouldnt have triggered

Jenny

08/09/2021, 4:26 PM

A Cloud Hook (or automation if using Cloud) would catch those for you if you need alerts for those in the future.

Samuel Hinton

08/10/2021, 8:02 AM

Ah, Cloud hooks look good. We have quite a number of flows, is it possible to configure the cloud hooks the same way as the state handler (by defining something attached to the flow using the python API?)

Jenny

08/10/2021, 1:00 PM

Good question! I haven't done this but I think you could set one using the

create_cloud_hook

mutation and possibly setting the version_group_id for your flow or querying for it. There's an old issue here with comments from one of our users about how they set this up: https://github.com/PrefectHQ/prefect/issues/2457

Samuel Hinton

08/10/2021, 1:03 PM

Hi Jenny, I agree that in the register isnt a good place as per the discussion, but its a shame there isnt another python API. From the comments it looks like we’d need at least graphql queries to set this up, and I have no experience with those. Is it possible to add a request for example queries added to the documentation. For now Ill go through and make them all manually as thatll be faster than trying to figure out graphql structure 🙂

Jenny

08/10/2021, 2:26 PM

Great suggestion on updating the docs there. I think there's one example but it's pretty basic: https://docs.prefect.io/orchestration/concepts/api.html#getting-started @Marvin open "Add more python client query examples to the docs"

🙏 1

Marvin

08/10/2021, 2:26 PM

https://github.com/PrefectHQ/prefect/issues/4853

2 Views

Open in Slack

Previous Next