Hey there. We have been seeing this error during s...
# ask-community
j
Hey there. We have been seeing this error during some flow runs
No heartbeat detected from the remote task; marking the run as failed.
. Any info on what is possibly causing this?
For more context we are hitting an external API and getting results then at some point the task seems to just stall out
k
Hi @Jacolon Walker, I think this answer is a good starting point. You can also see this part of the docs for the configuration
z
this has been happening to us for awhile too, no fix has worked as of yet
k
What version are you on? I think 0.15.4 propagates the error better to give us a better idea.
Have you tried the heartbeat config? I tried replicating this by spinning up an API with a very long sleep call and couldn’t. Are you querying an API too?
z
yeah we tried the hearbeat config, no luck. we are on 0.15.4 on both the agent and the flows. not an api, a long running db query. confirmed its not a memory issue too.
ours is related to k8s autoscaler tho
k
Gotcha. Will bring it up when I chat with the team
z
Agh. Are you getting any of our heartbeat failure logs?
z
i think what was happening is we had this in our job templates
Copy code
restartPolicy: OnFailure
switching this to Never to let prefect handle the retries for us, and I think this will resolve our issue
z
Glad to hear. I think that we should still handle multiple pods attempting to run the same flow better so if you have any heartbeat error logs that'd be helpful.
z
i don't have any interesting logs
i can share what was logged, one sec
j
We are on version 0.15.5 so should be good there
Re-running a flow now to see if it does the same thing