Hey everyone Can anyone tell me if the Orion API and or Sche Prefect Community #ask-community

Hey everyone! Can anyone tell me if the Orion API ...

Tom Manterfield

05/19/2022, 10:50 AM

Hey everyone! Can anyone tell me if the Orion API and/or Scheduler and/or UI support running multiple replicas? Is anyone running them in an HA config already?

✅ 1

Anna Geller

05/19/2022, 11:25 AM

Great question! One engineer from our team thinks about this problem a lot and we may address this in the second half of the year, but due to the complexity of doing it well it's not on an immediate roadmap in general, the OSS Prefect 2.0 orchestration layer will be easier to scale than e.g. Prefect 1.0 due to simpler architecture - you should be able to deploy multiple API servers talking to the same Postgres backend and load balance the incoming requests We definitely welcome your input if you are willing to contribute and share your solution with the community

Tom Manterfield

05/19/2022, 11:27 AM

You should be able to deploy multiple API servers talking to the same Postgres backend and load balance the incoming requests

To clarify, is this saying you can do that right now and it’s just the scheduler that needs work, or are you describing the target state for later development?

Tom Manterfield

05/19/2022, 11:28 AM

because if the API is already good then it solves my need, the scheduler I feel like I can get by with vertical scaling for quite a while.

Anna Geller

05/19/2022, 11:37 AM

I don't want to give you a wrong answer, but generally speaking, a scheduler service is part of the Orion API server and you could in theory (to achieve HA) have multiple uvicorn servers and load balance the requests what I meant with complexity is e.g. ensuring how those multiple replicas work together e.g. I know that we use idempotency keys to avoid duplicate work (the same run being picked up twice) but I don't know whether this would be enough to prevent duplicate work when running multiple Orion servers This is definitely a hard problem (as anything distributed/HA) and my main point is that I don't want to give you a sense that you can just deploy multiple replicas, add a load balancer on top and you're done, there is a lurking complexity here to ensure it all works well together

Anna Geller

05/19/2022, 11:40 AM

I feel like I can get by with vertical scaling for quite a while.

you're right that vertical scaling seems to be the best solution right now - especially ensuring that your database scales and performs well because this is the only stateful component

Tom Manterfield

05/19/2022, 11:53 AM

Okay, from the sounds of things you’ve at least baked in a concept of idempotency. I do appreciate the caution, definitely better than overconfidence in this area. I might just give it a shot in dev for a bit and see what happens. I was under the impression I could run the scheduler and the API separately though. I saw the

--no-scheduler

and

--scheduler

flags and assumed I could run Orion as the API only and then one with the scheduler only.

Anna Geller

05/19/2022, 11:59 AM

Indeed, it only shows that you already know more about it than I do 👏 it seems you can run a separate Orion server without UI, scheduler, mark late runs and analytics services

Anna Geller

05/19/2022, 12:03 PM

if you give it a try, could you share your work somehow e.g. via a public repo or even a public Gist with README? would be a great way to push the topic forward and continue iterating on that together, I'm sure others from the community would benefit from that too and could use it to contribute as well

Tom Manterfield

05/19/2022, 12:52 PM

Anything I make that’s shareable I definitely will. Once I’ve got my current setup working nicely and proven in a prod like environment then I’m planning to wrap most of it in a Kubernetes operator and open source it. Don’t want to do that until I know it actually works though as I don’t want to push out rubbish and have some poor unsuspecting dev pick it up.

👍 2

Zanie

05/19/2022, 2:51 PM

All of the services can be run separately from the API to scale them individually

🙌 1

Zanie

05/19/2022, 2:52 PM

I think your biggest issue in a setup with replicas will be Postgres database connection limits

🙏 1

Tom Manterfield

05/19/2022, 2:54 PM

Good info, thanks. I guess I can pool them with pgbouncer to ease that. I’m more comfortable having that problem than I am having something I can’t allow to scale.

Tom Manterfield

05/19/2022, 2:56 PM

@Zanie Is it possible to run the scheduler without the API? I can’t actually see how that one would be done. I can run the API without anything else, but I can’t see a way to do an equivalent of

prefect orion start --scheduler-only

Tom Manterfield

05/19/2022, 2:57 PM

Seems like I’d have to run one API standalone and then one with the scheduler, and just scale the standalone one for API traffic and accept the scheduler always comes with an API attached. I can live with that if so, but thought I’d check if I missed something.

Zanie

05/19/2022, 3:05 PM

You can call

python -m "prefect.orion.services.scheduler"

Tom Manterfield

05/19/2022, 3:05 PM

Oh cool! Thanks!

Zanie

05/19/2022, 3:06 PM

And similarly for any of the other services. I don’t think logging will be set up correctly yet though as we’ve not polished this.

5 Views

Open in Slack

Previous Next