Henri
09/27/2024, 8:48 PMwork pool
stops running, shows unhealthy, and no flow can run. We have two environments: us-east-1
and us-west-2
provisioned identically using terraform, but only us-west-2
experiences this issue.
When this occurs, no flows are working, and we must kill the affected work pool, redeploy the AWS ECS Service, wait for the work pool to register, and redeploy all Prefect flows.
We refactored the infrastructure, following this [guide](https://docs.prefect.io/integrations/prefect-aws/ecs_guide) so that it may spawn a new AWS Task whenever a flow is triggered instead of running all flows within a single AWS task. This didn't solve the issue.
This issue causes downstream problems as we rely on Prefect to process customer data.
We don't know how to fix this, has anyone experienced a similar issue?