Has anyone noticed any issues with automations tha...
# ask-community
j
Has anyone noticed any issues with automations that use the "flow run state stays in" trigger not triggering? We have one that's supposed to cancel flows that stay in "in progress" for eight hours, and I just found a flow run that went way past the limit and didn't get cancelled. Also several months ago we tried switching to using "stays in" for detecting errors so we wouldn't get one error per failed retry, but just one for when the whole process totally fails, but this also would sometimes just not trigger for certain flows, so we had to revert and now I'm nervous about trying it again. But the error message spam is getting annoying, especially for flows that ultimately succeed.
w
hey james - this should be rock solid reliable, sorry its not been in your experience. if you hit me with the url of the automation in question and the flow run you think should have tripped the wire im happy to dig in on our side.
j
Here is the flow run. I should've left it running but I cancelled it just a bit ago, which is what triggered the crash messages in the logs: flow run Here is the automation: automation
w
Thanks James - I’m looking into this. It seems to work reliably when I just have one state selected i.e running, but i agree something is messed up when multiple state are selected.
Hey James - I see whats going on. So the automation you have configured says: after any of these events:
Copy code
"prefect.flow-run.Retrying",
    "prefect.flow-run.Paused",
    "prefect.flow-run.Running",
    "prefect.flow-run.Pending"
We expect to see another flow run state event, and if we dont, then fire. What happens is that all of your flow runs are firing a pending events, and then a running event, so the automation doesn’t fire. Sorry, this is kind of unintuitive. What will work solidly is if you create one automation for each state. So one that monitors for stays in running, one monitoring for stays in pending, etc. If you are running up an automations count limit, just let me know i’ll up your count.
j
Oh awesome, thanks so much! Yes this is definitely unintuitive...
w
Yeah hope that works. I’m going to rethink how to make this a bit more intuitive on our end, thanks for the ping.
👍 1
j
I think we do need some more automation space. Can you please increase our limit up to 20 or something? Thanks so much!
w
that should be done
actually, your account manager is going to do this! thanks @Aimee McManus
a
hey @James Ashby! i'll connect with you about this rn separately
j
@Will Raphaelson Okay I tried setting this up and I must have done something wrong but I can't figure it out. We had several flow runs over the weekend that I would expect to trigger this automation: https://app.prefect.cloud/account/86eb4016-3bda-40df-a926-a5da355f8393/workspace/5c4b[…]/automations/automation/0c12794d-bb9a-4bcd-9739-212f3527515a an example flow run is: https://app.prefect.cloud/account/86eb4016-3bda-40df-a926-a5da355f8393/workspace/5c4b[…]d9e2/flow-runs/flow-run/08398583-c84b-4554-939a-714ae5c27455
w
Thanks for that James - this may be easier to hop on and debug sync if you wouldn't mind. @Aimee McManus do y'all have a sync coming up? else lets DM and find a quick time to huddle on this.
actually maybe one quick piece of low hanging fruit. it would seem that automation is filtering for flow runs with a "main" tag, but i dont see that flow run as having ran with any tags. lmk if im wrong there im pretty deep in a few different tables.
j
Well the deployment is tagged "main." I assumed that would tag all flow runs for that deployment. At least that's how it always seemed to work.
w
Can you confirm on your end that the deployment is tagged main and that the flow runs thereof have the tag main? that particular flow run doesn't look like it has it, at least from the events it produced (which is all i can see on my side)
j
Yep it does.
a
Hey @James Ashby! We want to get this resolved for you. Do you mind sending us an email with these details, via support@prefect.io? This will connect you with your dedicated CS Engineer and will open a ticket for this issue, which will help us track this and ensure we fully solve this for you. If we're not able to solve this via a couple emails, we're happy to also hop on a sync to make quicker progress. Thank you and please let me know if you have any questions!
🙏 1
j
okay thanks!
a
of course!