Hey all I m in the process of migrating our prefect setup to Prefect Community #ask-community

Hey all, I’m in the process of migrating our prefe...

Choenden Kyirong

09/20/2023, 3:20 PM

Hey all, I’m in the process of migrating our prefect setup to GCP. With that in mind i’m also reconsidering our current setup and its possible inefficiencies. Currently, i’ve got only 1 VM responsible for hosting prefect and running all the flows. The computational processes are all being done in that one VM. Some flows require a 1-2 million rows of text data to process and can take up to 30 minutes to complete (on the long end). Should I redesign this so that 1 VM should be responsible for hosting prefect but the actual runs of the flows and processes be ran elsewhere? And/or should the specific heavy tasks of a flow be ran elsewhere? Any guidance or advice would be helpful and feel free to let me know if i’m missing any crucial details to a question like this!

Deceivious

09/21/2023, 2:42 PM

Would def assign a separate server and worker. We host ours in the same VM but the resources are managed by Kubernetes. Maybe GKE with Google cloud run ?

Choenden Kyirong

09/21/2023, 6:40 PM

Ohh i see. So with the same VM approach, you use kubernetes in order to run either flows or tasks in specific infrastructure? @Deceivious

Deceivious

09/21/2023, 6:45 PM

Nah we run 3 VMs in different geographics areas for outage protection but yesyou can do that

Choenden Kyirong

09/21/2023, 6:50 PM

ohh i see. Kubernetes would probably be overkill on my end but i’m not actually sure how to actually judge that, Maybe just 1 vm for the server and 1 vm for the worker. How would you typically combat running multiple computationally expensive flows around the same time on that 1 VM worker? Or ideally, would you spin off separate infra for each flow run?

Choenden Kyirong

09/21/2023, 6:50 PM

@Deceivious

Deceivious

09/21/2023, 6:51 PM

yes id just a small server and maybe do the higher work load on the container instance

Deceivious

09/21/2023, 6:51 PM

ie managed runtime container environment

Deceivious

09/21/2023, 6:52 PM

Google run cloud

Choenden Kyirong

09/21/2023, 6:59 PM

ahh okay.

Choenden Kyirong

09/21/2023, 7:01 PM

I’ll look into google run cloud and gke thanks.

Deceivious

09/21/2023, 7:02 PM

GKE is just googles kubernetes

Choenden Kyirong

09/21/2023, 7:05 PM

yeah.

Choenden Kyirong

09/21/2023, 9:28 PM

whats the typical size/spec of the VM just responsible for the server? @Deceivious I’m assuming it does not need to be very big or powerful?

Deceivious

09/22/2023, 8:14 AM

depends on how many workers are connecting and how many API calls are being sent, Try small and extend as you go on

Deceivious

09/22/2023, 8:15 AM

Since you are planning to host only for server I guess not much, What is the plan with the database?

Deceivious

09/22/2023, 8:18 AM

I know azure not gcp, id use the following in your use case [search for equivalent service in gcp]: 1. Azure managed Postgres for Prefect database 2. Azure App service for Running of prefect server 3. Azure container instance for workers [might be expensive depending on uptime/ parallelism] / if uptime is high , id run Azure VM to cut costs I mean it really depends 😄 Cost / security/ reliability

Choenden Kyirong

09/22/2023, 6:19 PM

oh, hmmm… @Deceivious i actually don’t remember setting up a database for my last prefect setup. Is the database used to store metadata?

Choenden Kyirong

09/22/2023, 6:19 PM

And yeah, i’m going to start small. Probably 2vCPU 8GB ram and go from there.

Deceivious

09/22/2023, 6:20 PM

Prefect creates a SQLite database automatically when not specified. Not recommended for prod env.

Choenden Kyirong

09/22/2023, 6:20 PM

gotchya- it mustve been the sqlite db then.

Deceivious

09/22/2023, 6:20 PM

8gb ram is over kill imo

Deceivious

09/22/2023, 6:21 PM

Wait u mean for worker ? That's up to u. But for server that's over kill

Choenden Kyirong

09/22/2023, 6:21 PM

ahh i see. yeah, i wasn’t too sure on that front. i’ll size that down. Yup, it’s for the server.

Deceivious

09/22/2023, 6:22 PM

Yes just scale stuff after

Choenden Kyirong

09/22/2023, 6:22 PM

in terms of the db- we’ve got postgresql databases that we use so i’d probably just use that.

Deceivious

09/22/2023, 6:22 PM

Create a new database and use existing infra yes

Choenden Kyirong

09/22/2023, 6:25 PM

yup.

Choenden Kyirong

09/22/2023, 6:30 PM

Whats the thought process behind deciding on execution environments (worker pools, and workers) ? Does it make sense to have different execution environments depending on what needs to be executed (either based on the flow, or a task) ?

Deceivious

09/22/2023, 6:41 PM

I think mostly concurrency and system load

Choenden Kyirong

09/22/2023, 6:48 PM

ahh okay. Do processes that benefit from parallelism (using Dask or something) play a factor?

Deceivious

09/22/2023, 7:39 PM

Not that way. I mean more like you don't want 2 heavy processes to start in ur infra cuz it might not be bulky enough so you might want to limit concurrency

Choenden Kyirong

09/22/2023, 7:40 PM

gotchya.

2 Views

Open in Slack

Previous Next