i just had a task that ran twice i only wanted it to run onc Prefect Community #ask-community

i just had a task that ran twice. i only wanted it...

Andrew Lawlor

04/21/2022, 6:36 PM

i just had a task that ran twice. i only wanted it to run once and i got duplicate data as a result. i dont have retries enabled. is this something other people see sometimes? is there a way to avoid this?

Andrew Lawlor

04/21/2022, 6:36 PM

Kevin Kho

04/21/2022, 6:40 PM

Are you using Dask? and are you on Prefect Cloud? This is something like you had two jobs fire (Flows), or Dask recovering after dying can also re-submit tasks

Andrew Lawlor

04/21/2022, 6:40 PM

i am on Prefect Cloud and im using a LocalDaskExecutor

Andrew Lawlor

04/21/2022, 6:41 PM

the flow only ran once but it ran one task twice within (but only one)

Kevin Kho

04/21/2022, 6:41 PM

There should be version locking to block the second execution. Does it actually run twice?

Kevin Kho

04/21/2022, 6:41 PM

Is Kubernetes spinning up two jobs?

Andrew Lawlor

04/21/2022, 6:42 PM

i believe so. i got duplicate data from it. how do i see if kubernetes spun up two jobs?

Kevin Kho

04/21/2022, 6:42 PM

Check for two pods I think?

Andrew Lawlor

04/21/2022, 6:42 PM

do i have to turn version locking on?

Kevin Kho

04/21/2022, 6:43 PM

It should be on by default on Prefect Cloud. It is configurable per Flow and you can find it in the Flow settings

Andrew Lawlor

04/21/2022, 6:43 PM

and the job completed so the pods arent up anymore

Andrew Lawlor

04/21/2022, 6:43 PM

its disabled for me

Andrew Lawlor

04/21/2022, 6:44 PM

is there a way to turn it on programmatically? i havent used the graphql api at all

Kevin Kho

04/21/2022, 6:45 PM

Oh, turning that on may help because it puts a lock on the task execution. Are you mapping? Yeah just do

Copy code

query = """
mutation {
  enable_flow_version_lock(input: { flow_id: "your-flow-id-here" }) {
    success
  }
}
"""

from prefect.client import Client
client = Client()
client.graphql(query)

Kevin Kho

04/21/2022, 6:46 PM

I am wondering if you issue is related to this . I’ll read through it to dig more

Andrew Lawlor

04/21/2022, 6:47 PM

~~i am mapping~~ (edit: im not mapping at the task level)

Kevin Kho

04/21/2022, 6:47 PM

Let me read into that issue. I haven’t gone through it yet, but I think you are running into the issue.

Andrew Lawlor

04/21/2022, 6:49 PM

i am running GKE autopilot so i also think i am. im also mapping tasks on other flows so even if im not seeing it here it is probably relevant

Andrew Lawlor

04/21/2022, 6:54 PM

this does seem pretty similar to what im seeing, but I am using LocalDaskExecutor, not the full DaskExecutor as the others in that thread were using

Daniel Ross

05/03/2022, 6:44 PM

Any update to this thread? I have a similar problem (though I am using a DaskExecutor on ECS) and I have been unable to resolve it so far. The linked Git thread shed some light on the problem, but the only proposed solution seems to be dumping autoscaling, which would mean having to retool our CI which sets the cluster config when the flow is being registered. Not ideal.

Kevin Kho

05/03/2022, 6:50 PM

Hey @Daniel Ross , no update to this thread and I think the Github thread is the best place to follow and that issue is being worked on. You can chime in there if you have additional details (and to follow it)

👍 1

Daniel Ross

05/03/2022, 6:59 PM

Thanks Kevin! I'll keep an eye on it.

Andrew Lawlor

05/03/2022, 7:18 PM

i ended up moving away from autopilot and doing a standard cluster and i dont think im seeing the same issue anymore

👍 1

Daniel Ross

05/03/2022, 7:22 PM

Thanks for the update @Andrew Lawlor!

11 Views

Open in Slack

Previous Next