https://prefect.io logo
Title
a

Andrew Lawlor

04/21/2022, 6:36 PM
i just had a task that ran twice. i only wanted it to run once and i got duplicate data as a result. i dont have retries enabled. is this something other people see sometimes? is there a way to avoid this?
k

Kevin Kho

04/21/2022, 6:40 PM
Are you using Dask? and are you on Prefect Cloud? This is something like you had two jobs fire (Flows), or Dask recovering after dying can also re-submit tasks
a

Andrew Lawlor

04/21/2022, 6:40 PM
i am on Prefect Cloud and im using a LocalDaskExecutor
the flow only ran once but it ran one task twice within (but only one)
k

Kevin Kho

04/21/2022, 6:41 PM
There should be version locking to block the second execution. Does it actually run twice?
Is Kubernetes spinning up two jobs?
a

Andrew Lawlor

04/21/2022, 6:42 PM
i believe so. i got duplicate data from it. how do i see if kubernetes spun up two jobs?
k

Kevin Kho

04/21/2022, 6:42 PM
Check for two pods I think?
a

Andrew Lawlor

04/21/2022, 6:42 PM
do i have to turn version locking on?
k

Kevin Kho

04/21/2022, 6:43 PM
It should be on by default on Prefect Cloud. It is configurable per Flow and you can find it in the Flow settings
a

Andrew Lawlor

04/21/2022, 6:43 PM
and the job completed so the pods arent up anymore
its disabled for me
is there a way to turn it on programmatically? i havent used the graphql api at all
k

Kevin Kho

04/21/2022, 6:45 PM
Oh, turning that on may help because it puts a lock on the task execution. Are you mapping? Yeah just do
query = """
mutation {
  enable_flow_version_lock(input: { flow_id: "your-flow-id-here" }) {
    success
  }
}
"""

from prefect.client import Client
client = Client()
client.graphql(query)
I am wondering if you issue is related to this . I’ll read through it to dig more
a

Andrew Lawlor

04/21/2022, 6:47 PM
i am mapping (edit: im not mapping at the task level)
k

Kevin Kho

04/21/2022, 6:47 PM
Let me read into that issue. I haven’t gone through it yet, but I think you are running into the issue.
a

Andrew Lawlor

04/21/2022, 6:49 PM
i am running GKE autopilot so i also think i am. im also mapping tasks on other flows so even if im not seeing it here it is probably relevant
this does seem pretty similar to what im seeing, but I am using LocalDaskExecutor, not the full DaskExecutor as the others in that thread were using
d

Daniel Ross

05/03/2022, 6:44 PM
Any update to this thread? I have a similar problem (though I am using a DaskExecutor on ECS) and I have been unable to resolve it so far. The linked Git thread shed some light on the problem, but the only proposed solution seems to be dumping autoscaling, which would mean having to retool our CI which sets the cluster config when the flow is being registered. Not ideal.
k

Kevin Kho

05/03/2022, 6:50 PM
Hey @Daniel Ross , no update to this thread and I think the Github thread is the best place to follow and that issue is being worked on. You can chime in there if you have additional details (and to follow it)
👍 1
d

Daniel Ross

05/03/2022, 6:59 PM
Thanks Kevin! I'll keep an eye on it.
a

Andrew Lawlor

05/03/2022, 7:18 PM
i ended up moving away from autopilot and doing a standard cluster and i dont think im seeing the same issue anymore
👍 1
d

Daniel Ross

05/03/2022, 7:22 PM
Thanks for the update @Andrew Lawlor!