what's best practice for dealing with Storage? We have our flows in github and for testing we want t...

rectalogic

06/28/2022, 6:57 PM

what's best practice for dealing with Storage? We have our flows in github and for testing we want them to run locally in checked out repo using

Local

Storage, but then for production we want the same flow to run on our ECS agent using a

GitHub

storage pointing to the same repo. With the

UniversalRun

config it can run on either agent, but how do we make the storage conditional so it uses the correct storage depending on which agent it's targetted to?

Kevin Kho

06/28/2022, 9:49 PM

This can’t be done because you need storage to be defined at registration time so this will take two separate registrations. This is part of the reason storage was decoupled from deployment

rectalogic

06/28/2022, 9:52 PM

so how is this typically handled? Define two `Flow`s for each set of tasks, one with Local and one with GitHub? Or edit the source code to test with a local agent vs ECS agent?

Kevin Kho

06/28/2022, 9:53 PM

I think something like that. You can define a flow in one script with no storage. And then import it from a second script and attach the storage and run config and then call register. So you can decouple the reregistration script and that makes things a bit better

Kevin Kho

06/28/2022, 9:54 PM

So your registration script can take parameters through CLI that then push it to dev or prod.

rectalogic

06/28/2022, 10:03 PM

to attach a storage to a flow later, should I directly access the attribute

myflow.storage = GitHub()

or do I add the flow to the storage

GitHub().add_flow(myflow)

Kevin Kho

06/28/2022, 10:05 PM

Exactly, just the first one

rectalogic

06/28/2022, 10:05 PM

thanks

Kevin Kho

06/28/2022, 10:06 PM

Just note executor specifically needs to be defined in the flow itself because it’s not serialized along with the Flow. RunConfig, Storage are fine to attach this way

3 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.