https://prefect.io logo
Title
j

James Gatter

02/17/2023, 3:28 PM
My ECSTasks are timing out after they reach PENDING status, even after I tried adding ECS full access. My setup guide was the Prefect 2 Fargate guide on Medium (using Prefect 2.7.12). What might be the cause of this?
1
15:18:59.006 | INFO    | prefect.agent - Submitting flow run 'f2d5d977-88e2-4a90-af08-b07e224728f1'
15:18:59.309 | INFO    | prefect.infrastructure.process - Opening process 'tested-butterfly'...
15:18:59.368 | INFO    | prefect.agent - Completed submission of flow run 'f2d5d977-88e2-4a90-af08-b07e224728f1'
/usr/local/lib/python3.9/runpy.py:127: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
15:19:01.792 | INFO    | Flow run 'tested-butterfly' - Downloading flow code from storage at ''
15:19:02.755 | INFO    | Flow run 'tested-butterfly' - Network: ip-10-0-30-30.ec2.internal. Instance: Linux-5.10.162-141.675.amzn2.x86_64-x86_64-with-glibc2.31. Agent is healthy ✅️
15:19:02.757 | INFO    | Flow run 'tested-butterfly' - Python = 3.9.16. API: 0.8.4. Prefect = 2.7.12 🚀
15:19:02.869 | INFO    | Flow run 'tested-butterfly' - Finished in state Completed()
15:19:03.624 | INFO    | prefect.infrastructure.process - Process 'tested-butterfly' exited cleanly.
15:19:46.465 | INFO    | prefect.agent - Submitting flow run '25532c7f-c80b-423c-99b3-d716de85712f'
15:19:46.823 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'statuesque-grasshopper': Retrieving task definition 'prefect__quilt-s3__cyflows'...
15:19:46.864 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'statuesque-grasshopper': Registering task definition...
15:19:47.279 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'statuesque-grasshopper': Creating task run...
15:19:47.903 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'statuesque-grasshopper': Waiting for task run to start...
15:19:47.924 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'statuesque-grasshopper': Status is PROVISIONING.
15:19:57.970 | INFO    | prefect.infrastructure.ecs-task - ECSTask 'statuesque-grasshopper': Status is PENDING.
15:21:48.479 | ERROR   | prefect.agent - Failed to submit flow run '25532c7f-c80b-423c-99b3-d716de85712f' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 476, in _submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 610, in run
    ) = await run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 805, in _create_task_and_wait_for_start
    self._wait_for_task_start(
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 1048, in _wait_for_task_start
    for task in self._watch_task_run(
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 1033, in _watch_task_run
    raise RuntimeError(
RuntimeError: Timed out after 120.57538270950317s while watching task for status {until_status or 'STOPPED'}
15:21:48.481 | INFO    | prefect.agent - Completed submission of flow run '25532c7f-c80b-423c-99b3-d716de85712f'
I've tried three different flow deployments (pushed using the GitHub action in that workflow) and no dice.
All my flows use a custom image built on the Prefect official that is about 4 GB. It takes about 4 minutes to pull. Could this be the explanation for the timeout?
1
My agent on ECS has
Task CPU 4 vCPU, Task memory 8 GB
so it probably can't be a resource-bound containment but possibly a time-bound one?
r

Ryan Peden

02/17/2023, 8:37 PM
The 4 minute pull time sounds like the cause. The ECSTask block waits for 2 minutes by default, but the timeout is adjustable. If you open the block in the UI you'll see a Task Watch Timeout Seconds field you can edit.
Screenshot 2023-02-17 at 3.36.56 PM.png
j

James Gatter

02/17/2023, 8:40 PM
Surprised I never saw that option, thanks! Giving it a go now
👍 1
It worked!! Thanks so much!
r

Ryan Peden

02/17/2023, 8:49 PM
You're welcome! I'm happy to hear it worked.