<@ULVA73B9P>: Working with ECS pool, my flow start...
# ask-marvin
b
@Marvin: Working with ECS pool, my flow starts and then is stuck at "Completed submission of flow run". It remains "Pending". How can i figure what is going wrong?
m
It appears that if your flow run is stuck in a "Pending" state after completing submission, there may be a couple of issues to check: 1. Resource Allocation: Make sure that the Prefect agent on AWS ECS Fargate has been allocated sufficient CPU and memory for your workloads. Inadequate resources could prevent the agent from provisioning infrastructure for your flow runs and monitoring their execution properly. 2. Task Definition and ECS Configuration: Review your ECS task definition and configuration to ensure that it matches the requirements of your flow runs. There might be issues with how the ECS service is set up, such as task roles, security groups, or subnets configurations that can hinder task placement. 3. Suspension: Double-check if the flow run is inadvertently suspended, as suspended flow runs will not proceed until someone resumes them. However, this is less likely if all runs get stuck and could indicate a broader issue with resource allocation or ECS configuration. If adjusting the resources allocated to the ECS agent doesn't solve the problem, you might want to consult the AWS CloudWatch logs for the ECS tasks to identify any service-level issues that could be causing the runs to remain pending. Would you like to search for any known issues on GitHub that might be related to yours?
n
this is often an indication of something wrong with the worker, ie the worker can’t pick it up
do you have logs for your ecs worker process?
b
Thanks for the hint. I added logs, and I can see the log stream created in CloudWatch, but the logs are empty. I am a bit stuck.
I could see a httpx.ConnectTimeout after a couple of minutes, which I guess is an automatic action from the worker. So the logging works, but it doesn't log anything showing why the flow doesnt run.
I finally identified a critical mistake on my side: my tasks were not running on the same VPC than the prefect server, that's why they couldn't communicate, and that's why I was only logging a httpx timeout.