Hey guys so continuing to learn from the architecture I was Prefect Community #ask-community

Hey guys, so continuing to learn from the architec...

wiretrack

07/07/2021, 8:40 PM

Hey guys, so continuing to learn from the architecture, I was wondering about scalability outside the cloud version. I’ve been running a few testes and the database grows really quickly since it’s persisting task_run_state, flow_run_state and other data that tends to grow really fast when you have a relatively large amount of flows (+500 / +1000). I wouldn’t think execution would be a problem though, if I use the Kubernetes Agent and flows are running as k8s jobs, I’m guessing this would result in almost infinite scale in the execution sense, but still, the scheduler would have to query the database, and even though it should query a small table (

flow

), I was wondering if the large amount of rows on other tables will start to get in the way of the frontend performance (and hasura’s, and apollo’s) . Putting

state

in mongodb or something should completely solve the challenge (not really sure if it’s really a challenge), but it seems that this would be a huge change, since the code is really well tied together. I was wondering how do you guys see scalability on the server, and curious on what approaches the cloud version uses to overcome potencial scalability issues in the long term.

Kevin Kho

07/07/2021, 8:47 PM

Hey @wiretrack, when users start to have flows beyond the scale that Server, can reasonably handle, then we recommend moving to Cloud as Cloud has optimizations meant for scaling. For Server though, you can

TRUNCATE

the logs table regularly. I don’t think the API performance is affected if you’re not querying the larger tables. A lot of memory issues also come up because of persistence of results. You can disable checkpointing with

@task(checkpointing=False)

, or you can name the result files explicitly such that they are overwritten with every Flow run.

wiretrack

07/07/2021, 8:50 PM

Fair enough, I though of making a background job to truncate logs, and even older flow/task states. Didn’t know about checkpointing though, good tip. Besides log and states, any other obvious potential bottlenecks considering k8s agent and executing flows as jobs?

Kevin Kho

07/07/2021, 8:56 PM

I can’t think of anything immediate now that are specific to server, but with Kubernetes in general just that Flows won’t run if they can’t enough resources.

wiretrack

07/07/2021, 9:07 PM

Cool, having control over the cluster makes me less worried about the resources for the job, I was just thinking about the structural bottlenecks. Thanks @Kevin Kho!

👍 1

Open in Slack

Previous Next