Hi all, I am running my prefect flows in AWS ECS b...
# ask-community
s
Hi all, I am running my prefect flows in AWS ECS based on Prefect’s Docker image (
prefecthq/prefect:2.13.2-python3.10
). Since Saturday (23rd Sept) at 1pm (UTC+8), all our flow runs have been crashing. All prior flow runs were successful. Have verified internally that no settings changes were made to our org’s AWS. There were no changes to any of our flows immediately before the crashes starting either. The ECS logs for a single flow run are the following:
Copy code
03:40:03.187 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Registering task definition...
03:40:03.600 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Creating task run...
03:40:04.104 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Waiting for task run to start...
03:40:04.139 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Status is PROVISIONING.
03:40:14.196 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Status is PENDING.
03:40:39.345 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Status is RUNNING.
03:40:44.349 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Running command 'python -m prefect.engine' in container 'prefect' (prefecthq/prefect:2.13.2-python3.10)...
03:40:44.632 | INFO    | prefect.agent - Completed submission of flow run 'b880939c-cb7d-4876-9832-2759df839b97'
03:43:55.558 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Status is DEPROVISIONING.
03:44:10.742 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Status is STOPPED.
03:44:10.777 | WARNING | prefect.infrastructure.ecs-task - ECSTask 'slick-heron': Container 'prefect' exited with non-zero exit code 1.
03:44:11.070 | INFO    | prefect.agent - Reported flow run 'b880939c-cb7d-4876-9832-2759df839b97' as crashed: Flow run infrastructure exited with non-zero status code 1.
No logs are recorded in the prefect console. No other stack trace info is given in the ECS logs. What could be the problem?
Also, not sure how to go about this but is it possible to make ECS logs more verbose to better help debug the issue? Right now it seems that I can only influence the logs during the flow run but not in the ecs task provisioning stage. In this case, the flow never gets to run so none of my logs are captured.
Things I’ve tried with no success: • Redeploying the ECS cluster with no changes to the terraform definition • Redeploy all my deployments without any changes to the flows/deployment args • Checked with Marvin but suggestions don’t seem relevant