Hey prefect - this is a strange one - I got two er...
# prefect-community
b
Hey prefect - this is a strange one - I got two errors from flows overnight with
Copy code
State message: Submission failed. IndexError: list index out of range
Unfortunately there are no logs in the agent with any failures either 🤷 This did lead me to see some other errors in the agent 🧵
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 265, in _submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 523, in run
    status_code = await run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 68, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(call, cancellable=True)
  File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 633, in _watch_task_and_get_exit_code
    task = self._wait_for_task_finish(
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 880, in _wait_for_task_finish
    last_log_timestamp = self._stream_available_logs(
  File "/usr/local/lib/python3.9/site-packages/prefect_aws/ecs.py", line 924, in _stream_available_logs
    response = logs_client.get_log_events(**request)
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the GetLogEvents operation: The specified log stream does not exist.
but that is probably an issue for another thread. So any ideas how to debug that Flow failure?
z
The agent should dump logs when that happens — weird that it didn’t include a traceback.
I guess we also need to guard against intermittently missing log streams during log streaming 😢 come on AWS
b
AWS needs to pick up its game!!!
wasn't this PR part of the
0.1.8
release?
z
Yeah, is your agent on that?
b
yeah it is
z
😢
Unfortunately I can’t know more without a traceback
b
yeah I totally understand.... It is the first time it happened, and it did happen twice in 1 minute. I will keep an eye out for it and report back with more. btw - noticed 2.7.0 is out, how long does it take for the new base image to be available on Dockerhub?
ahh I can just use this image
2-python3.11
I just usually specify the prefect version in case something breaks...
z
I’d recommend using a specific version but 🤷 the images take ~5 minutes to publish https://github.com/PrefectHQ/prefect/actions/runs/3596998566/jobs/6058371962
b
oh yeah they just came out - too excited ! haha
s
Hey was this issue ever resolved? I am getting the log stream can’t be found thing now, as well. using python 3.9 and prefect 2.7.1
b
I think that's a different issue @Sam Werbalowsky. I get those errors too.
z
Please open a ticket in
prefect-aws
.
t
Hey, we got a similar issue because the agent was trying to start the ECSTask with too much memory and CPU. AWS allows by default only 6 vCPU and we were using 8. Maybe this helps.
z
That’s good to know! It was that error? Can you open an issue with the traceback so we can add handling?