<@ULVA73B9P> Is it good practice to use prefect ta...
# ask-marvin
g
@Marvin Is it good practice to use prefect tasks to orchestrate several thousands of API calls?
m
Yes, Prefect is designed to handle orchestration of tasks at scale, including thousands of API calls. It provides robust error handling and retry mechanisms, making it well-suited for managing large numbers of tasks, such as API calls. Prefect's architecture allows for distributed execution of tasks, which means you can scale out your workflows across multiple machines. This is particularly useful when dealing with a large number of tasks. However, it's important to consider the limitations and requirements of the APIs you're interacting with. For example, some APIs have rate limits that restrict the number of calls you can make in a certain time period. You'll need to ensure your Prefect tasks respect these limits. Also, if your tasks are I/O bound (such as making network requests), you might want to consider using Prefect's asynchronous tasks feature. This allows Prefect to manage other tasks while waiting for the network request to complete, improving the overall efficiency of your workflow. Remember to handle exceptions and retries appropriately in your tasks to account for any potential API failures or rate limiting issues.
n
i think this answer deserves a big asterisk there are also our rate limits here if you're on cloud also, at some point if you go too far wrapping everything in a task, then it just becomes overhead at some point. happy to give more specific feedback if you have a concrete use case
☝️ 1
g
yes @Nate please! I am at that crossroads right now if I should just stick to my own non-task based parallel approach as it is faster than using tasks and concurrent task runner with the rate limit I have.
n
depending on the granularity that need task functionality (e.g. extreme case being, every request needs independent retries / caching) with tons of tasks, usually you can sort of "checkpoint" functions that manage many subprocesses and can only succeed if all the contained subprocesses (e.g. regular python, run_deployment calls etc) also succeed and then you wrap those checkpoints with a task / subflow etc to create the shape of your workflow - does that make sense?
thats just some high level advice tho, i know design always needs to pin to the use case - feel free to give an MRE that represents your use case to myself or marvin
> I am at that crossroads right now if I should just stick to my own non-task based parallel approach as it is faster than using tasks and concurrent task runner with the rate limit I have. if what you have works for you, then theres likely not be an inherent benefit in adding tasks. one way you might be interested in expanding observability of these non task processes, is by using
emit_event
🙏 1
g
depending on the granularity that need task functionality (e.g. extreme case being, every request needs independent retries / caching)
with tons of tasks, usually you can sort of "checkpoint" functions that manage many subprocesses and can only succeed if all the contained subprocesses (e.g. regular python, run_deployment calls etc) also succeed and then you wrap those checkpoints with a task / subflow etc to create the shape of your workflow - does that make sense?
I suppose not every API request needs the retries or error handling that tasks can provide, we are just refactoring a bunch of pipelines to leverage Prefect as much as possible (we used to have just one big flow) so I suppose we need to evaluate what makes sense for being a task and the tradeoffs between them. Right now it seems tasks may not be the best fit for this relatively straightforward use case.
And I have not heard of
emit_event
before, so I will definitely look that up, we definitely want to introduce more observability wherever possible so this would be extremely helpful!
Would you say that hosting our own Prefect server would allow us to manage and scale thousands of tasks compared to Prefect Cloud?
n
biased obv but personally i would use prefect cloud (the Events api is not in open source) and also there's just a lot of nice things in there youd have to do yourself in OSS, you can start there free the events system is really powerful, here's a short example that uses events to chain decoupled workflows (deployment triggers can be used with any deployment, not just served deployments) ... one sec, fetching example
as far as scale, prefect cloud manages most teams scale pretty well, if you grow beyond that, we sometimes work with people more closely on custom infra
g
I see, thank you
Does each call to
emit_event
contribute to one of the 400 requests per minute?