Andreas Nigg

10/11/2022, 8:45 AM
Hey dear community, I've a funny little use case: 1. My flow does some data deduplication 2. Input data are stored in a cached database and marked as read when they were successfully worked on 3. In the final step, the flow checks for already existing data in an external system and I update some external systems with the deduplicated data 4. The flow uses tasks for each of the tasks at hand and the last step is run with the concurrent task runner, meaning that it spawns up to 50 tasks which execute the upsert operations concurrently 5. The flow should run once per 10 minutes (I'm aware that this is a non-ideal setup, but let's say... a lot of legacy SW being involved... 😅) So now the problem: Everything works fine for as long as the flow runs once at a time. As soon as the flow runs twice at the same time, all hell breaks loose 🔥 🔥 🔥 (or at least I potentially end up in having duplicates). A flow running twice at the same time might happen, when one of the flow runs takes longer than 10 minutes and a new scheduled run get started. Is there sort of an option to limit flow runs to one at a time and keep new runs late until all previous runs are completed/failed/crashed (done in general)? Alternatively, what are some recommendations for how to implement such a behavior on the flow side? (At the start of the flow, I could for example check for an existing, active flow and than simply not continue but end the flow before doing any harm - but this somehow feels a little hacky...) I can't really use task concurrency limits here, as in the final step (where I check for existing data and than upsert), I really need to have the concurrent tasks, otherwise the operation is too slow.

Mason Menges

10/11/2022, 4:10 PM
Hey Andreas is this Prefect 1 or Prefect 2?
Assuming this is for prefect 2 you can set concurrency limits on the work queue, this applies to concurrent flow runs so it won't affect the tasks within the flow 😄

Andreas Nigg

10/11/2022, 5:38 PM
Ahhh ok. Totally forgot that work queues are a thing as well. Awesome, thanks!