Daniel Rodriguez Ciotti
07/15/2025, 9:14 AMMarvin
07/15/2025, 9:14 AMMarvin
07/15/2025, 9:15 AMstopTimeout
in your ECS task definition (recommended 120-180 seconds)
- Configure your ECS service's auto-scaling cool-down periods to allow for graceful termination
3. Best Practices for ECS:
json
{
"containerDefinitions": [{
"stopTimeout": 120,
// other configuration...
}]
}
4. Additional Recommendations:
- Monitor your worker logs for shutdown signals
- Consider implementing health checks that respect the shutdown state
- Use ECS capacity providers with appropriate termination protection settings
However, there's an important caveat: While Prefect workers handle SIGTERM gracefully, the effectiveness depends on ECS giving enough time for the shutdown process. If ECS forcefully terminates the container before the graceful shutdown completes, you may still experience zombie flows.
To fully protect against zombie flows, you should:
1. Ensure your ECS task definition includes adequate stop timeout values
2. Configure your auto-scaling policies with appropriate cool-down periods
3. Monitor for premature terminations and adjust timeouts as needed
4. Consider using ECS capacity providers that support managed termination protection
If you're still experiencing issues after implementing these recommendations, you might want to:
1. Increase your ECS task definition's stopTimeout
2. Review your auto-scaling policies to ensure they're not too aggressive
3. Implement additional monitoring to track when/how tasks are being terminated
Would you like me to provide more specific details about any of these aspects or help with configuring your ECS task definitions?