one of the recurring issues i have with prefect is that things often seem to fail for no reason, and frequently with absolutely no context. i restarted this flow a minute later with no changes at all, and it ran successfully. are there any recommendations about handling these situations?
k
Kevin Kho
11/30/2021, 5:54 PM
Hey @Martim Lobao this sounds like ECS is just failing to start. Do you have CloudWatch logs enabled for this?
m
Martim Lobao
11/30/2021, 6:24 PM
unfortunately our cloudwatch logs don’t offer any more context either
Martim Lobao
11/30/2021, 6:24 PM
that warning is unrelated, it’s also present on successful runs
k
Kevin Kho
11/30/2021, 6:26 PM
What is your launch type and do you use Dask? Do you use spot instances?
m
Martim Lobao
11/30/2021, 6:39 PM
yes, we’re using
LocalDaskExecutor
and fargate
k
Kevin Kho
11/30/2021, 6:48 PM
LocalDask should not be a problem. Sometimes, spinning up a Dask cluster fails. I think the issue here is Fargate. Could you see if these debugging methods help or give any insight? I think you are looking at task logs, but maybe we can look for container logs
👍 1
m
Martim Lobao
11/30/2021, 6:56 PM
unfortunately the agent’s logs provide even less details
k
Kevin Kho
11/30/2021, 7:03 PM
Do you have anything under Details for the Flow container?