Ikkyu Choi

11/22/2022, 11:55 PM
Hi, I’m trying to sending alarm to slack when my flow failed. Normal case it works well, but when flow about lack of aws resource (i.e., gpu) fails i didnt’t get the alarm. Now I’m using ECS agnet of prefect 0.15.3. Anyone could help?

Mason Menges

11/23/2022, 8:25 PM
Hey @Ikkyu Choi Infrastructure crashes can be difficult for us to catch due to the nature of the hybrid model we make a best attempt through heartbeats to monitor the flow in Prefect 1 but this doesn't always catch everything. It's generally best to setup some additional logging on the AWS side to catch infrastructure failures outside of the flow and setup reporting there, this may be a good place to start for ECS