Hi all! I had a very strange behaviour with a task...
# ask-community
d
Hi all! I had a very strange behaviour with a task being executed twice, with a few seconds interval:
Copy code
16:46:32 Task 'x': Starting task run...
16:46:33 Task 'x': Finished task run for task with final state: 'TriggerFailed'
01:39:32 Task 'x': Starting task run...
01:39:46 Task 'x': Starting task run...
Don't know if it could be linked, but I had to retry the parent task that failed (hence the "TriggerFailed" that you can see above). I'm running using a k8s executor, with core version 0.14.20. Any idea what could have happened?
k
Do you have
flow.run()
left in your Flow by chance? Is it all tasks or just one task?
d
Hi Kevin, No, there is no
flow.run()
within the flow. Actually, the next task had been started twice as well, it's ongoing.
Could it be something like two k8s pods both executing the same flow?
k
Ah did Lazarus trigger another Flow run?
d
From the dashboard, I only see this flow run 🤔 Apparently, one of our DevOps killed some pods yesterday, so this really looks like 2 pods running the same flow.
k
I mean from the logs, do you see Lazarus events re-submitting the Flow Run?
Second apart is pretty weird though because Lazarus is like a 10 minute thing
d
Nope, nothing else when looking at the full logs... I'll check with our DevOps to see the status of the pods.
Upon further investigation, looks like the parent task timed out not from our own code but because of no heartbeat ("No heartbeat detected from the remote task; marking the run as failed." (prefect-server.ZombieKiller.TaskRun) ), except that the worker actually continued in the backgroud. At the k8s level, there were indeed 2 pods deployed instead of one, so I'm guessing upon retrying the job was taken by the 2nd pod, while the 1st one was still running the flow.
k
I see, maybe threaded heartbeats can help with that?
d
I'll check it out, thank you very much Kevin!