what's best practice for dealing with Storage? We have our flows in github and for testing we want them to run locally in checked out repo using
Local
Storage, but then for production we want the same flow to run on our ECS agent using a
GitHub
storage pointing to the same repo. With the
UniversalRun
config it can run on either agent, but how do we make the storage conditional so it uses the correct storage depending on which agent it's targetted to?
k
Kevin Kho
06/28/2022, 9:49 PM
This can’t be done because you need storage to be defined at registration time so this will take two separate registrations. This is part of the reason storage was decoupled from deployment
r
rectalogic
06/28/2022, 9:52 PM
so how is this typically handled? Define two `Flow`s for each set of tasks, one with Local and one with GitHub? Or edit the source code to test with a local agent vs ECS agent?
k
Kevin Kho
06/28/2022, 9:53 PM
I think something like that. You can define a flow in one script with no storage. And then import it from a second script and attach the storage and run config and then call register. So you can decouple the reregistration script and that makes things a bit better
Kevin Kho
06/28/2022, 9:54 PM
So your registration script can take parameters through CLI that then push it to dev or prod.
r
rectalogic
06/28/2022, 10:03 PM
to attach a storage to a flow later, should I directly access the attribute
myflow.storage = GitHub()
or do I add the flow to the storage
GitHub().add_flow(myflow)
?
k
Kevin Kho
06/28/2022, 10:05 PM
Exactly, just the first one
r
rectalogic
06/28/2022, 10:05 PM
thanks
k
Kevin Kho
06/28/2022, 10:06 PM
Just note executor specifically needs to be defined in the flow itself because it’s not serialized along with the Flow. RunConfig, Storage are fine to attach this way
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.