Hello folks, i am getting an error during a flow r...
# ask-community
a
Hello folks, i am getting an error during a flow run for one of the long running tasks. the error is:
Copy code
No heartbeat detected from the remote task; marking the run as failed.
Other details: • Its running on ECS (agent is ECS) • Flow storage is S3 • In the same Flow other long running tasks did run successfully What can i do to prevent this timeouts? and how to fix it. As this doc suggests Lazarus will restart the failed task. but i need a way to prevent the timeout if possible. Screenhot:
one reason i can guess is the instance type the agent is running on - CPU etc. let e check that as well.
k
Prefect has heartbeats which check if your Flow is alive. If Prefect didn’t have heartbeats, flows that lost communication and die would permanently be shown as Running in the UI. 95% of the time, we have seen “no heartbeat detected” as a result of running out of memory. If you are confident the task will succeed, you can separate it out into its own subflow and then turn off heartbeats for that subflow. We also rolled out a recent change you can try where you can configure heartbeats to be threads instead of processes. The documentation for that is here .