Hey all, I’m in the process of migrating our prefe...
# ask-community
c
Hey all, I’m in the process of migrating our prefect setup to GCP. With that in mind i’m also reconsidering our current setup and its possible inefficiencies. Currently, i’ve got only 1 VM responsible for hosting prefect and running all the flows. The computational processes are all being done in that one VM. Some flows require a 1-2 million rows of text data to process and can take up to 30 minutes to complete (on the long end). Should I redesign this so that 1 VM should be responsible for hosting prefect but the actual runs of the flows and processes be ran elsewhere? And/or should the specific heavy tasks of a flow be ran elsewhere? Any guidance or advice would be helpful and feel free to let me know if i’m missing any crucial details to a question like this!
d
Would def assign a separate server and worker. We host ours in the same VM but the resources are managed by Kubernetes. Maybe GKE with Google cloud run ?
c
Ohh i see. So with the same VM approach, you use kubernetes in order to run either flows or tasks in specific infrastructure? @Deceivious
d
Nah we run 3 VMs in different geographics areas for outage protection but yesyou can do that
c
ohh i see. Kubernetes would probably be overkill on my end but i’m not actually sure how to actually judge that, Maybe just 1 vm for the server and 1 vm for the worker. How would you typically combat running multiple computationally expensive flows around the same time on that 1 VM worker? Or ideally, would you spin off separate infra for each flow run?
@Deceivious
d
yes id just a small server and maybe do the higher work load on the container instance
ie managed runtime container environment
Google run cloud
c
ahh okay.
I’ll look into google run cloud and gke thanks.
d
GKE is just googles kubernetes
c
yeah.
whats the typical size/spec of the VM just responsible for the server? @Deceivious I’m assuming it does not need to be very big or powerful?
d
depends on how many workers are connecting and how many API calls are being sent, Try small and extend as you go on
Since you are planning to host only for server I guess not much, What is the plan with the database?
I know azure not gcp, id use the following in your use case [search for equivalent service in gcp]: 1. Azure managed Postgres for Prefect database 2. Azure App service for Running of prefect server 3. Azure container instance for workers [might be expensive depending on uptime/ parallelism] / if uptime is high , id run Azure VM to cut costs I mean it really depends 😄 Cost / security/ reliability
c
oh, hmmm… @Deceivious i actually don’t remember setting up a database for my last prefect setup. Is the database used to store metadata?
And yeah, i’m going to start small. Probably 2vCPU 8GB ram and go from there.
d
Prefect creates a SQLite database automatically when not specified. Not recommended for prod env.
c
gotchya- it mustve been the sqlite db then.
d
8gb ram is over kill imo
Wait u mean for worker ? That's up to u. But for server that's over kill
c
ahh i see. yeah, i wasn’t too sure on that front. i’ll size that down. Yup, it’s for the server.
d
Yes just scale stuff after
c
in terms of the db- we’ve got postgresql databases that we use so i’d probably just use that.
d
Create a new database and use existing infra yes
c
yup.
Whats the thought process behind deciding on execution environments (worker pools, and workers) ? Does it make sense to have different execution environments depending on what needs to be executed (either based on the flow, or a task) ?
d
I think mostly concurrency and system load
c
ahh okay. Do processes that benefit from parallelism (using Dask or something) play a factor?
d
Not that way. I mean more like you don't want 2 heavy processes to start in ur infra cuz it might not be bulky enough so you might want to limit concurrency
c
gotchya.