< Marvin> Working with ECS pool my flow starts and then is s Prefect Community #ask-marvin

<@ULVA73B9P>: Working with ECS pool, my flow start...

Baker

02/02/2024, 11:19 PM

@Marvin: Working with ECS pool, my flow starts and then is stuck at "Completed submission of flow run". It remains "Pending". How can i figure what is going wrong?

Marvin

02/02/2024, 11:20 PM

It appears that if your flow run is stuck in a "Pending" state after completing submission, there may be a couple of issues to check: 1. Resource Allocation: Make sure that the Prefect agent on AWS ECS Fargate has been allocated sufficient CPU and memory for your workloads. Inadequate resources could prevent the agent from provisioning infrastructure for your flow runs and monitoring their execution properly. 2. Task Definition and ECS Configuration: Review your ECS task definition and configuration to ensure that it matches the requirements of your flow runs. There might be issues with how the ECS service is set up, such as task roles, security groups, or subnets configurations that can hinder task placement. 3. Suspension: Double-check if the flow run is inadvertently suspended, as suspended flow runs will not proceed until someone resumes them. However, this is less likely if all runs get stuck and could indicate a broader issue with resource allocation or ECS configuration. If adjusting the resources allocated to the ECS agent doesn't solve the problem, you might want to consult the AWS CloudWatch logs for the ECS tasks to identify any service-level issues that could be causing the runs to remain pending. Would you like to search for any known issues on GitHub that might be related to yours?

Nate

02/03/2024, 2:27 AM

this is often an indication of something wrong with the worker, ie the worker can’t pick it up

Nate

02/03/2024, 2:27 AM

do you have logs for your ecs worker process?

Baker

02/03/2024, 9:55 PM

Thanks for the hint. I added logs, and I can see the log stream created in CloudWatch, but the logs are empty. I am a bit stuck.

Baker

02/03/2024, 10:21 PM

I could see a httpx.ConnectTimeout after a couple of minutes, which I guess is an automatic action from the worker. So the logging works, but it doesn't log anything showing why the flow doesnt run.

Baker

02/03/2024, 11:34 PM

I finally identified a critical mistake on my side: my tasks were not running on the same VPC than the prefect server, that's why they couldn't communicate, and that's why I was only logging a httpx timeout.

2 Views

Open in Slack

Previous Next