hi, i just encountered a bug where prefect seems t...
# ask-community
m
hi, i just encountered a bug where prefect seems to have been running the same flow run twice in parallel — not the same flow in two parallel runs, but the same flow twice in the same run. a few of this flow’s tasks spin up an emr job, and all 3 jobs got triggered twice in the same run. this is the first time i’ve encountered this issue, but i suspect it might be related to the flow having been triggered through a flow of flows. the flow had failed midway (during the GCP outage this week), and so i had to restart the child flow and the parent flow of flows. it seems that this caused the flow run to restart twice in parallel. i’m not sure sharing logs will be of any value, but i’d be happy to share the flow run ids to maybe help get some insight into what happened.
k
Hey @Martim Lobao, did you get two different flow ids? or did the tasks just run twice in the same flow run?
m
same flow run id: https://cloud.prefect.io/pdl/flow-run/e8d5b1e7-42c9-447b-8d7e-cb34369f40cd?logs= the logs show these 3 tasks got started twice each:
Copy code
20:09:47
INFO
CloudTaskRunner
Task 'pull_metrics': Starting task run...
20:09:47
INFO
CloudTaskRunner
Task 'index_pdl_name_records': Starting task run...
20:09:48
INFO
CloudTaskRunner
Task 'index_random_sample': Starting task run...
20:09:48
INFO
CloudTaskRunner
Task 'index_pdl_name_records': Starting task run...
20:09:48
INFO
CloudTaskRunner
Task 'index_random_sample': Starting task run...
20:09:48
INFO
CloudTaskRunner
Task 'pull_metrics': Starting task run...
a
Could it be that you have
flow.run()
in your flow definition?
m
don’t think so, each task is just a wrapper around a
GET
request to AWS’s execute API, and each task is only called once in the flow definition (using the
with Flow(…) as flow
syntax)
we’ve also been running this flow every week and this has never happened before
the only difference i can think of is that this time the flow was kicked off as a child in a flow of flows
k
This happens when there is some failover mechanism that kicks in and retires the tasks. This is most common with Dask where a worker that holds tasks can die, and then Dask will spin up a new worker that will re-run the tasks. To avoid the re-running, Prefect has Version Locking, which I think will also help you. Maybe you can try turning it on for the Flow.
m
thanks @Kevin Kho — what are the side-effects of enabling version locking? will we still be able to restart failed tasks? also i’m still not sure i understand why this happened: all 3 tasks got submitted twice within a single second. it’s weird that a task would die in that short amount of time
k
Yes you will be able to restart failed tasks. I’m confused too because the retry from the parent flow should not trigger retries in child flows.
👍 1