Hello. We are running an on-prem solution using Pr...
# ask-community
s
Hello. We are running an on-prem solution using Prefect 3 (
3.0.1
) with some automations set up. These are just basic automations that is triggering a deployment after a flow run has completed. A week ago these automations suddenly stopped working, with no errors or anything. We do get the event
prefect.automation.triggered
but are missing both the
prefect.automation.action.triggered
and
prefect.automation.action.executed
events which we previously got. Here is the trigger for one of the automations (created in the UI):
Copy code
{
  "type": "event",
  "match": {
    "prefect.resource.id": "prefect.flow-run.*"
  },
  "match_related": {
    "prefect.resource.id": [
      "prefect.flow.6597b113-5b41-4c19-9f7f-6f7765195208"
    ],
    "prefect.resource.role": "flow"
  },
  "after": [],
  "expect": [
    "prefect.flow-run.Completed"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Reactive",
  "threshold": 1,
  "within": 0
}
Anyone got an idea on how to get this working again? Thanks in advance.
b
Hi Sondre! That trigger looks good to me. For reference, here's one from my Prefect account:
Copy code
{
  "type": "event",
  "match": {
    "prefect.resource.id": "prefect.flow-run.*"
  },
  "match_related": {
    "prefect.resource.role": "flow",
    "prefect.resource.id": [
      "prefect.flow.185028eb-ddc7-44b7-a2d1-5f212727e4c6"
    ]
  },
  "after": [],
  "expect": [
    "prefect.flow-run.Completed"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Reactive",
  "threshold": 1,
  "within": 0
}
For your server, did you switch to version
3.0.1
around the time when the automation stopped working? Have you attempted to re-create the automation to see if the behavior persists?
s
Hey Bianca! The server has been running flawless with version
3.0.1
for months, with no changes recently. I have re-created the automation for one of the pipelines but the issue still persists. We do use the “basic”
serve
deployment method, so no work pools or workers (yet). Today we have 21 deployments running, everything scheduled in different time slots (so they should not run i parallel). Could this be a resource issue? Apparently something happened yesterday as well where our slack-integration (in automation) didn’t notify on an error. We got the
prefect.automation.triggered
today as well, but still missing the other two events mentioned previously, so nothing triggers the expected deployment runs.
🤔 1
b
The server has been running flawless with version
3.0.1
for months, with no changes recently.
That is puzzling. At the very least, you should be seeing an automation action failed event if something goes wrong when trying to run the deployment. It's hard to say if this is a resource issue. If it was, I'd imagine you'd see other problems cropping up in your execution environment.
Happy to troubleshoot this with you live if you'd like: https://calendly.com/prefect-experts/bianca
s
True, was expecting an error of some sort.. Thanks for the offering to troubleshoot, but I’ll have a talk with the infrastructure-team first as it seems to run fine in our QA environment..
b
Sounds good Sondre, and thanks for sharing your note about the QA environment! Keep us posted with what you find
s
Hi Bianca. We seem to have solved the problem now. The problem was a nginx proxy which were supposed to be used for custom authentication for the UI (not implemented yet). This seems to suddenly cause issues with the server, as the other environments also started to fail. Removed everything related to passing traffic through nginx which seems to have fixed the issue. It looks stable for now 👍 Thanks for helping with the troubleshooting!
🎊 1
prefect spin 1
🚀 1