Hi all I had a very strange behaviour with a task being exec Prefect Community #ask-community

Hi all! I had a very strange behaviour with a task...

Didier Marin

03/24/2022, 1:33 PM

Hi all! I had a very strange behaviour with a task being executed twice, with a few seconds interval:

Copy code

16:46:32 Task 'x': Starting task run...
16:46:33 Task 'x': Finished task run for task with final state: 'TriggerFailed'
01:39:32 Task 'x': Starting task run...
01:39:46 Task 'x': Starting task run...

Don't know if it could be linked, but I had to retry the parent task that failed (hence the "TriggerFailed" that you can see above). I'm running using a k8s executor, with core version 0.14.20. Any idea what could have happened?

Kevin Kho

03/24/2022, 1:42 PM

Do you have

flow.run()

left in your Flow by chance? Is it all tasks or just one task?

Didier Marin

03/24/2022, 1:47 PM

Hi Kevin, No, there is no

flow.run()

within the flow. Actually, the next task had been started twice as well, it's ongoing.

Didier Marin

03/24/2022, 1:48 PM

Could it be something like two k8s pods both executing the same flow?

Kevin Kho

03/24/2022, 1:50 PM

Ah did Lazarus trigger another Flow run?

Didier Marin

03/24/2022, 1:54 PM

From the dashboard, I only see this flow run 🤔 Apparently, one of our DevOps killed some pods yesterday, so this really looks like 2 pods running the same flow.

Kevin Kho

03/24/2022, 1:55 PM

I mean from the logs, do you see Lazarus events re-submitting the Flow Run?

Kevin Kho

03/24/2022, 1:56 PM

Second apart is pretty weird though because Lazarus is like a 10 minute thing

Didier Marin

03/24/2022, 2:02 PM

Nope, nothing else when looking at the full logs... I'll check with our DevOps to see the status of the pods.

Didier Marin

03/24/2022, 3:34 PM

Upon further investigation, looks like the parent task timed out not from our own code but because of no heartbeat ("No heartbeat detected from the remote task; marking the run as failed." (prefect-server.ZombieKiller.TaskRun) ), except that the worker actually continued in the backgroud. At the k8s level, there were indeed 2 pods deployed instead of one, so I'm guessing upon retrying the job was taken by the 2nd pod, while the 1st one was still running the flow.

Kevin Kho

03/24/2022, 3:36 PM

I see, maybe threaded heartbeats can help with that?

Didier Marin

03/24/2022, 3:37 PM

I'll check it out, thank you very much Kevin!

6 Views

Open in Slack

Previous Next