https://prefect.io logo
#prefect-server
Title
# prefect-server
j

jack

05/14/2022, 2:45 PM
Seeing a new pattern today running tasks on ECS. The prefect logs show
Flow run SUCCESS: all reference tasks succeeded
and then the next log line says
No heartbeat detected from the remote task; marking the run as failed.
Here is a screenshot of the logs
The flow-run-id is
2bb7aa77-8f9f-4b56-933c-a0c1a53636a7
Using prefect cloud. The ECS agent says "core version 1.2.1"
The ECS page shows a normal exit
a

Anna Geller

05/14/2022, 3:36 PM
Are you on Prefect Cloud or Server? You posted on Server and in that case flow run ID wouldn't help This page explains the issue and shows some things you may do
j

jack

05/14/2022, 3:37 PM
On prefect cloud. Which is the best channel to post in?
a

Anna Geller

05/14/2022, 3:38 PM
#prefect-community for the next time 🙂 thx, I'll check the logs then. Have just 10 min then I need to run
do you know which task caused that issue?
do you have any long running jobs?
j

jack

05/14/2022, 3:39 PM
This task usually runs in 5 or 10 minutes
👍 1
The flow only has one task
a

Anna Geller

05/14/2022, 3:41 PM
yup, you do just some dataframe manipulation, really weird - did rerunning the flow help?
maybe some timeout or network connectivity issue on ECS side? hard to determine any issue from logs - it looks like all the work has been done successfully, only not marked as success, correct?
j

jack

05/14/2022, 3:43 PM
We kick off about 30 flow runs (all the same flow) We kicked off the batch three times and on the third time all flow runs succeeded.
Yes, it appears to be a success but then gets marked as failure. Makes me question race condition.
a

Anna Geller

05/14/2022, 3:44 PM
good thinking, could be
j

jack

05/14/2022, 3:44 PM
ECS shows the task as excited normally.
a

Anna Geller

05/14/2022, 3:45 PM
in that case, perhaps you could add an Automation on this flow? you could say: if the flow run doesn't finish within X time or if it doesn't finish successfully, trigger a new flow run
you could set it up from the UI - might be useful at least as a temporary solution over the weekend 😅
well, the container didn't throw any non-zero exit code so from ECS perspective it's all fine as long as the container doesn't crash
7 Views