https://prefect.io logo
Title
s

Sven Teresniak

09/08/2020, 7:37 AM
Whats the most elegant/preferred way to synchronize flow runs? Using/setting tags like locks and polling prefect for runs w/ or w/o tags to decide/delay flow runs? I have more than one trigger for a certain flow-run and I need every run to be "atomic", that is, running several times is okay, but never in parallel.
j

Jenny

09/08/2020, 2:06 PM
Hi @Sven Teresniak - Just checking I understand this. You want a way to limit how many times a flow runs concurrently? Are you using server or cloud?
s

Sven Teresniak

09/08/2020, 4:05 PM
v0.13.5 server mode. yes, I want to limit the amount a specific flow runs in parallel. this flow can be triggered by schedule or by a re-aggregation-task. but the flow is a combination of checks and side effects and thus leads to a race condition when started twice (in parallel)
we build most of our flows idempotent. in this case we can change (like in "repair") input data at any time and then just re-start the complete chain of processing
thus, every task by itself can be stateless. and just rely on input data and very few parameters
but this needs synchronisation
j

Jenny

09/08/2020, 6:06 PM
Hi @Sven Teresniak For Sever we don't have a "most elegant" way of doing this - you could add a task at the start of the flow that checks if other flows are running (via the prefect api), and if so raises a 
ENDRUN
 signal with the appropriate state (which might be success/failure, or a reschedule even to try again after some time). Alternatively you could have the check task repeatedly retry until it's the only occurrence of the flow before continuing. If you ever switch to Cloud, we have a new flow concurrency feature which would do this in a much more elegant way. (Updates to the UI and docs for this should be added very soon.)
s

Sven Teresniak

09/09/2020, 5:29 AM
Its a policy in our infrastructure to not rely on external services. We are using EC2 but apart from the hardware we are completely cloud-agnostic and free of external dependencies. From K8S up to Presto, Spark, etc. That said it is nearly impossible for me to switch to Prefect Cloud. Regarding the concurrency: I will try to use tags as lock-like synchronisation. With a task querying all other flows for existing tags or stuff like that. Alternatively I have to bite the bullet and use some kind of lockfile or S3 object. But this will be ugly as hell 😕