https://prefect.io logo
Title
p

Pedro Machado

03/10/2022, 4:26 PM
Hi there. I have a flow that is orchestrated by Prefect Cloud. It has 1k+ mapped tasks and it failed after processing 250. There are 20 tasks in retrying state showing this message:
No heartbeat detected from the remote task; retrying the run.This will be retry 1 of 2.
However, the last message was written 12 hours ago. It does not look like the flow got retried at all. Could someone help me figure out what may have happened? I am using ECS to run the flows. The flow run is
fed29ba9-66a6-4f0b-aa69-cf9db908bd58
I went ahead and canceled it but it does not offer me the option to restart it. How can I restart it and preserve the same flow run ID so that the tasks that succeeded are not executed again? Ideally, I'd rely on the task state but if they need to run again, I am hoping that caching will help avoid doing the work again. The cache key is based on the
flow_run_id
. Thanks!
k

Kevin Kho

03/10/2022, 4:29 PM
Have you seen this part about the heartbeats? These are mostly memory issues is it might help to bump the memory of the ECS task or you can use threaded heartbeats. What happens when you go to the UI? There is no restart button? I think you can use targets to cache each mapped task individually. The cache key doesn’t work well for mapping because they all share the same cache key.
p

Pedro Machado

03/10/2022, 4:47 PM
Hi Kevin. I had not seen that document. I will try threaded heartbeats. The restart does not show because I canceled the flow (it was still running, but stuck). I configured
targets
based on the
flow_run_id
and map index. The only issue would be that I am using
today
as well and this ran last night:
target="prefect-results/employee_profiles/"
    "{refresh-mode}/{today}/{flow_run_id}_{map_index}.json",
Assuming that I work around the date issue, could I just set the flow run to
scheduled
? Would it take advantage of caching?
Actually, since the date is UTC, if I run this now, it should use the same date
k

Kevin Kho

03/10/2022, 4:58 PM
You could try moving to scheduled or mark it failed and then restart
p

Pedro Machado

03/10/2022, 5:09 PM
Interesting, I now see restart after refreshing the page. It's running now. Thanks, Kevin!
👍 1