is the major scaling problem for concurrent flows that the p Prefect Community #prefect-server

is the major scaling problem for concurrent flows,...

DJ Erraballi

10/08/2020, 10:42 PM

is the major scaling problem for concurrent flows, that the prefect server needs to be available to recieve checkpointing data and store it?

Chris White

10/08/2020, 10:56 PM

Hi DJ - big question! Let me try to answer it concisely without getting too in the weeds. Both Server and Cloud provide an API that drives the UI as well as all workflow operations (setting states, sending logs, updating configuration settings, releasing work at the right time, etc.) There are additionally a lot that happens behind the API, both when it’s in use and when it’s not. For example, in both Server and Cloud there is a “Zombie Killer” service that is constantly monitoring for Running tasks / flows that have stopped talking to the API. Cloud of course has more of these services and hooks than Server, but the idea is the same - providing a monitoring / insurance platform for your workflows. Prefect Server specifically has a scaling limit because every request that hits the API requires a database query. This means that every time you open the UI, every time you send a log or a state, etc. you are talking directly to the database. Cloud, on the other hand, has much more caching + horizontal and vertical scale built in so can essentially scale to infinity.

DJ Erraballi

10/08/2020, 11:06 PM

Awesome thanks for the quick and detailed reply. The daemon processes that are running on the service as far as I can tell: 1. Scheduling clocks 2. Zombie Killer -> assuming number of concurrent flows doesn’t adversely affect performance for this guy significantly And then for non-daemon requests that scale with number of flows: 1. Receive states from running flows 2. Receive Logs

DJ Erraballi

10/08/2020, 11:06 PM

and the scheduler is what is “releasing work”?

DJ Erraballi

10/08/2020, 11:07 PM

For prefect server are logs stored in the postgres server, or can we push those to S3?

Chris White

10/08/2020, 11:09 PM

Yea, there is also a Lazarus process that is also not really affected by concurrent flow runs; Prefect Agents work on a polling model, so they make an API request for work, and that is received + logic is run that determines what flow runs should be run by the particular agent making the request. And logs are stored in Postgres for Server; you can move them to S3 for sure but you’ll have work to do if you want to see them in the UI (because the UI is querying the database for those logs). Also note that the API is not a passive receiver of states - whenever a state is set, other logic kicks into action (e.g., “if the flow run is finished, don’t let task runs enter running states”)

DJ Erraballi

10/08/2020, 11:11 PM

lazarus -> retry mechanism?

DJ Erraballi

10/08/2020, 11:11 PM

ok interesting to know that the prefect agents are actually polling for work

DJ Erraballi

10/08/2020, 11:12 PM

so the srever itself has a GQL interface, that then interactrs with Hasura to make modificatiosn to the DB?

👍 1

DJ Erraballi

10/08/2020, 11:12 PM

sry im trying to crash aquaint myself with the project 😞

Chris White

10/08/2020, 11:13 PM

Lazarus identifies flow runs that haven’t completed for some reason and reschedules them; yea the whole system was designed so that metadata / orchestration happens 100% separated from the execution of the workflows. This is largely a security feature but also allows for diverse execution environments. (see https://medium.com/the-prefect-blog/the-prefect-hybrid-model-1b70c7fd296)

DJ Erraballi

10/08/2020, 11:21 PM

do youg uys have an architecture diagram or flow diagram for how these process interact with eachother, understand if you don’t but would definitely help me wrap my head around things

Chris White

10/08/2020, 11:22 PM

We do, but that’s more on the sales side of the house so I recommend emailing us at

<mailto:hello@prefect.io|hello@prefect.io>

for a deeper dive

DJ Erraballi

10/08/2020, 11:30 PM

👍 thx will do

16 Views

Open in Slack

Previous Next