Hi friends,
We are running on Prefect Cloud + AWS Workers. Yesterday our workers stopped working 😰. We are not sure how to determine what happened. We were able to fix by:
• Recreating the work pool
• Re-deploying all flows and container
This ended up being about a 1.5 hour outage for us, so we would like to 'fix' any issue we might have.
Are there specific logs/events we should be looking at?
I am attaching:
• AWS Graph
• Flow Runs... many scheduled
• Queue "Not Ready"
We updated Prefect to 2.19.8, 2 days ago.
AWS -> 8 CPU + 16 Gb memory on ECS; with auto scale.
• we initially added more servers, but this did not resolve the issue