Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi, can anyone enlighten me regarding  scaling prefect 

The scenario is i have setup my inferencing pipeline where the user upload an image goes through the pipeline and return back the results

The problem here is there can be 1000-2000 inferencing request can come in one go, and then the question is

1. Will prefect can handle it 
2. Is yes then how (my prefect is setup on cloud)
Is there something like i have to setup auto-scaling kindof thing 

Please if anyone can help me on this

Hi <@U04HZKYRZMZ> - I'd guess there are a lot of different ways of going about this, but here are my thoughts

&gt; 1. Will prefect can handle it
I think this is largely a matter of how you would choose to set up your execution infrastructure to run your pipeline. For example, if you set up a runtime environment than can run many instances of your pipeline in parallel (Google Cloud Run, GKE, EKS, fargate, etc) then I'm sure Prefect could handle the orchestration / observability of these pipelines. But the real heavy work would be performed by your runtime infrastructure, so you'd have to make sure that you allocate resources appropriately.

&gt; 2. Is yes then how (my prefect is setup on cloud)
It depends on how you decide to setup your runtime infrastructure! For example, if you used Google Cloud Run as alluded to above, you could define 2 prefect flows (and then create <https://docs.prefect.io/concepts/deployments/|deployments> from them):
• orchestrator
• worker
a worker could be defined to run a batch of inferencing requests through the pipeline as a single Google Cloud Run, so you'd just have to <https://medium.com/the-prefect-blog/serverless-prefect-flows-with-google-cloud-run-jobs-23edbf371175|create a deployment for the worker flow with a Google Cloud Run infrastructure block>,

the orchestrator could be triggered on the event of the user uploading an image (by making an api call to run this deployment), and would be responsible for distributing work across as many worker flows as necessary to handle all the inferencing requests.

so tldr in terms of prefect cloud:
• choose  / setup a runtime environment (I recommend google cloud run)
• create 2 deployments in your Prefect Cloud: *orchestrator*, *worker*
• trigger the *orchestrator* deployment on the event of a user submitting something
• run as many *worker* deployments as you need to handle all the requests
hope this is helpful - if you are interested in dedicated infrastructure / prefect support for this project, you could reach out to <mailto:cs@prefect.io|cs@prefect.io>

Hi Nate, Thanks a lot for the detailed response.
I just had one doubt regarding how can we run as many *worker* deployments as you need to handle all the requests
Also let say i have 2 flow. so for each flow i have create two deployments (*orchestrator*, *worker*)
any resource, link that you can provide.

Hi <@U04HZKYRZMZ> - I'm not sure of any resources that exactly address your use case, but there is a recipe I created a while ago that may be useful based on what you've shared (<https://github.com/PrefectHQ/prefect-recipes/blob/main/flows-advanced/parent-orchestrator/orchestrating-worker-flows.py|here>)

worth noting that there is now a `run_deployment` utility `from prefect.deployments import run_deployment` that would allow you to kick off another deployment programmatically (instead of using the client directly like I did)

I'll re-iterate that you can receive more in-depth support by reaching out to <mailto:cs@prefect.io|cs@prefect.io>, as its not feasible for us to address all complex use cases for everyone in community

hope that's helpful! feel free to reach out with specific questions if you come up with them