Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

image.png

Hi,

I have a headscratcher of a problem regarding the stability of the worker pool. We experience uncommon failures where the Prefect `work pool` stops running, shows unhealthy, and no flow can run. We have two environments: `us-east-1` and `us-west-2` provisioned identically using terraform, but only `us-west-2` experiences this issue.

When this occurs, no flows are working, and we must kill the affected work pool, redeploy the AWS ECS Service, wait for the work pool to register, and redeploy all Prefect flows.

We refactored the infrastructure, following this [guide](<https://docs.prefect.io/integrations/prefect-aws/ecs_guide>) so that it may spawn a new AWS Task whenever a flow is triggered instead of running all flows within a single AWS task. This didn't solve the issue.

This issue causes downstream problems as we rely on Prefect to process customer data.

We don't know how to fix this, has anyone experienced a similar issue?