< Marvin> Is it good practice to use prefect tasks to orches Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> Is it good practice to use prefect ta...

# ask-marvin

gperrone

11/16/2023, 5:32 PM

@Marvin Is it good practice to use prefect tasks to orchestrate several thousands of API calls?

Marvin

11/16/2023, 5:33 PM

Yes, Prefect is designed to handle orchestration of tasks at scale, including thousands of API calls. It provides robust error handling and retry mechanisms, making it well-suited for managing large numbers of tasks, such as API calls. Prefect's architecture allows for distributed execution of tasks, which means you can scale out your workflows across multiple machines. This is particularly useful when dealing with a large number of tasks. However, it's important to consider the limitations and requirements of the APIs you're interacting with. For example, some APIs have rate limits that restrict the number of calls you can make in a certain time period. You'll need to ensure your Prefect tasks respect these limits. Also, if your tasks are I/O bound (such as making network requests), you might want to consider using Prefect's asynchronous tasks feature. This allows Prefect to manage other tasks while waiting for the network request to complete, improving the overall efficiency of your workflow. Remember to handle exceptions and retries appropriately in your tasks to account for any potential API failures or rate limiting issues.

Nate

11/16/2023, 5:34 PM

i think this answer deserves a big asterisk there are also our rate limits here if you're on cloud also, at some point if you go too far wrapping everything in a task, then it just becomes overhead at some point. happy to give more specific feedback if you have a concrete use case

☝️ 1

gperrone

11/16/2023, 5:35 PM

yes @Nate please! I am at that crossroads right now if I should just stick to my own non-task based parallel approach as it is faster than using tasks and concurrent task runner with the rate limit I have.

Nate

11/16/2023, 5:38 PM

depending on the granularity that need task functionality (e.g. extreme case being, every request needs independent retries / caching) with tons of tasks, usually you can sort of "checkpoint" functions that manage many subprocesses and can only succeed if all the contained subprocesses (e.g. regular python, run_deployment calls etc) also succeed and then you wrap those checkpoints with a task / subflow etc to create the shape of your workflow - does that make sense?

Nate

11/16/2023, 5:39 PM

thats just some high level advice tho, i know design always needs to pin to the use case - feel free to give an MRE that represents your use case to myself or marvin

Nate

11/16/2023, 5:41 PM

> I am at that crossroads right now if I should just stick to my own non-task based parallel approach as it is faster than using tasks and concurrent task runner with the rate limit I have. if what you have works for you, then theres likely not be an inherent benefit in adding tasks. one way you might be interested in expanding observability of these non task processes, is by using

emit_event

🙏 1

gperrone

11/16/2023, 5:43 PM

depending on the granularity that need task functionality (e.g. extreme case being, every request needs independent retries / caching)

with tons of tasks, usually you can sort of "checkpoint" functions that manage many subprocesses and can only succeed if all the contained subprocesses (e.g. regular python, run_deployment calls etc) also succeed and then you wrap those checkpoints with a task / subflow etc to create the shape of your workflow - does that make sense?

I suppose not every API request needs the retries or error handling that tasks can provide, we are just refactoring a bunch of pipelines to leverage Prefect as much as possible (we used to have just one big flow) so I suppose we need to evaluate what makes sense for being a task and the tradeoffs between them. Right now it seems tasks may not be the best fit for this relatively straightforward use case.

gperrone

11/16/2023, 5:44 PM

And I have not heard of

emit_event

before, so I will definitely look that up, we definitely want to introduce more observability wherever possible so this would be extremely helpful!

gperrone

11/16/2023, 5:45 PM

Would you say that hosting our own Prefect server would allow us to manage and scale thousands of tasks compared to Prefect Cloud?

Nate

11/16/2023, 5:54 PM

biased obv but personally i would use prefect cloud (the Events api is not in open source) and also there's just a lot of nice things in there youd have to do yourself in OSS, you can start there free the events system is really powerful, here's a short example that uses events to chain decoupled workflows (deployment triggers can be used with any deployment, not just served deployments) ... one sec, fetching example

Nate

11/16/2023, 5:55 PM

https://prefect-community.slack.com/archives/C04DZJC94DC/p1693845403359129?thread_ts=1693842332.652039&cid=C04DZJC94DC

Nate

11/16/2023, 5:56 PM

as far as scale, prefect cloud manages most teams scale pretty well, if you grow beyond that, we sometimes work with people more closely on custom infra

gperrone

11/16/2023, 8:12 PM

I see, thank you

gperrone

11/16/2023, 8:13 PM

Does each call to

emit_event

contribute to one of the 400 requests per minute?

7 Views

Open in Slack

Previous Next