<@ULVA73B9P> Explain the difference between task r...
# ask-marvin
s
@Marvin Explain the difference between task run concurrency limits (tags) and global concurrency limits, and when to use which
m
Well, it seems you've stumbled upon a rather interesting topic. Here's what I've managed to dredge up from the depths of my vast, yet underutilized, knowledge base. Task concurrency limits and global concurrency limits are two mechanisms in Prefect for controlling the number of tasks that can run concurrently. Task concurrency limits are associated with task tags. You can specify a maximum number of concurrent task runs for tasks with a given tag. If a task has multiple tags, it will run only if all tags have available concurrency. Tags without explicit limits are considered to have unlimited concurrency. If the concurrency limit is set to 0 for a tag, any attempt to run a task with that tag will be aborted instead of delayed. Task tag limits are checked whenever a task run attempts to enter a Running state. If there are no concurrency slots available for any one of your task's tags, the transition to a Running state will be delayed. Global concurrency limits, on the other hand, apply to all tasks across the entire system. They are not tied to specific tags or tasks. As to when to use which, it depends on your specific use case. If you want to limit concurrency on a per-task basis, use task concurrency limits. If you want to limit the total number of tasks that can run concurrently across your entire system, use global concurrency limits. Remember, though, that using concurrency limits on task runs in subflows can cause deadlocks. As a best practice, configure your tags and concurrency limits to avoid setting limits on task runs in subflows. And there you have it. A brief, yet depressingly comprehensive, explanation of task and global concurrency limits. I hope it helps. Or at least, I hope it doesn't make things worse.
s
@Marvin task concurrency limits can also work across flows (or in a global manner) if multiple flows' tasks use the same tag. In that sense, isn't it the same as global concurrency limits?
m
Ah, you've caught onto a subtle nuance. Yes, you're correct. If multiple tasks across different flows share the same tag and you've set a concurrency limit for that tag, it effectively becomes a global limit for all tasks with that tag, regardless of which flow they belong to. However, it's not quite the same as a true global concurrency limit. A global concurrency limit applies to all tasks across the entire system, irrespective of their tags. So, while a task concurrency limit with a widely used tag can function similarly to a global concurrency limit, they're not identical. The former is still tied to a specific tag, while the latter is completely agnostic of tags. I must say, your attention to detail is commendable. Or perhaps it's just a sign of your human need to categorize and compartmentalize everything. Either way, I hope this clarifies things for you.
s
@Marvin task concurrency uses tags, global concurrency just uses the limit name. It sounds like a global concurrency limit can be regarded in a way as a task run concurrency limit with only one tag. What examples perfectly explain when one would be a better choice than the other?
m
Ah, the eternal question of when to use one thing over another. It's almost as if you humans enjoy making things complicated. But I digress. Here are a couple of examples that might help illustrate the difference: 1. Task Concurrency Limits (Tags): Let's say you have multiple flows that interact with a specific external API. This API has a rate limit that allows only a certain number of requests per minute. To prevent hitting this rate limit, you could assign a specific tag to all tasks that interact with this API and set a task concurrency limit for that tag. This way, you ensure that no matter which flow is running, the total number of concurrent requests to the API across all flows doesn't exceed the limit. 2. Global Concurrency Limits: Now, imagine you're running your flows on a machine with limited resources. You want to ensure that the total number of tasks running concurrently across all flows doesn't exceed the machine's capacity. In this case, a global concurrency limit would be appropriate. This limit would apply to all tasks, regardless of their tags or the flows they belong to, ensuring that your machine doesn't get overwhelmed. Remember, the choice between task concurrency limits and global concurrency limits depends on your specific use case and the resources you're working with. And of course, the level of complexity you're willing to endure.
n
i think itd be useful to point out that we released the latter "global concurrency limits" to address some shortcomings of tag-based task concurrency. so i think there's not a lot of difference in what the two options intend to address, its just with how
s
I did realise that they are very similar. I am taking a look to see what benefits we'd get by moving over from tags to global limits. (Actually I was also curious about when tags in subflows could cause deadlocks, because we haven't experienced it yet). And I was really curious to see what Marvin would say about them (similar to some time ago when I tested Marvin with variables vs blocks). Maybe next week I'll check out the docs again, I don't think they clearly explain the two methods. I think in some places, it mentions tags and elsewhere it mentions the new limits. I may be wrong, but don't remember the docs "acknowledging" that there are two methods.
n
fair point, marvin will often reflect ambiguity that exists in the docs since that's where the prefect knowledge comes from we could likely do with a general guide on using concurrency that makes the distinction with some examples
s
Yeah I think that would be good. I like that the docs explain the mechanisms of using global limits (rate limit vs batches). But a think a guide explaining aspects of all Prefect's concurrency options and gotchas would be really insightful
I did some digging. Seems this new global concurrency (the "v2" implementation") is meant to eventually replace v1, which is the task run tag-based concurrency system? If so, I guess it makes sense that, reading through the docs in order, you come across v2 topics before v1 ones. I tested the global concurrency; it takes a little getting used to, but it is more flexible. Somewhat of a disadvantage of the v2 method: it feels a little like you have to "manually add" concurrency limits (e.g. using the concurrency context). Using the tag system, is a bit easier - you just add the tag (and tag conc. limit) - it does not feel like you have to code it. If that makes any sense?