Some questions about moving off a completely local...
# ask-community
k
Some questions about moving off a completely local agent & execution model on Heroku, along with the cloud scheduler, to a more scalable GCP based execution model. Here is what we would like to achieve, with some questions... • Flow descriptions are read directly from Github • We build a single Docker container from our python monorepo which can be used to both resolve flows/task structure and run the tasks • We don't really need parallelisation inside a specific flow although if it's free it wouldn't hurt • However we would like to be able to scale up execution for our flows in general (on GCP), ideally without just increasing the size of a single VM. So either just scale up to as many VMs as flows running at the same time, or using a fixed pool of them. • We don't re-register/restart everything on every release like we do today to be safe that changes are picked up We've been through the docs a few times and have a few questions... • If we declare flows in our codebase, using the github storage, what causes the definition to be re-read or a new flow to be discovered? • What is the right run_config for our flows so they just run as containers with some controllable CPU/memory requirements with reasonable defaults? • What is the right type of agent? Presumably it depends on the choice of run_config. • What is the recommended way to get secrets/config env vars to the flows & running tasks?
k
Hey @Kostas Chalikias If you use Github storage, Prefect will always go to the Github repo to get the Flow. So it will always be re-read. A new flow will not be discovered if you don’t register it. If your registered version of the Flow does not match the version pulled from Github, an error will be raised. Small changes are fine, but changes that alter the DAG structure will throw errors. For GCP, look into the Google Vertex agent here that was released in the last version. These will let you choose the machine type per flow in the Vertex RunConfig. For secrets and env vars, you can have them stored in Prefect Cloud, and then called in your Flow. You can also attach them to the agent by adding the
--env
flag to add an environment variable.
k
Thanks @Kevin Kho, do you know if the GCP VMs are automatically started/stopped using the Vertex Agent? Also, if my flow definition file has external code dependencies, how do I use a container to make sure they are available when it's being parsed?
k
Yes they are, and you would provide the Vertex an image that it is able to pull
k
Great thanks