Hi yall. Working through deploying our first flow to production. We have several flows in a repository and some shared modules that are imported into each flow that contain tasks and other utility code. From the documentation it seems like the only way to ensure that the shared modules are available at flow execution time is to package them up in a Docker image and use a Docker/ECR/K8S agent.
That feels a little heavy - is there any way to package up dependencies like that during pickling?
Folder structure is below. The flow needs access to stuff in
src/tasks
and the tasks need access to stuff in
src/utils
•
src
◦
flows
▪︎
my_flow.py
▪︎
my_other_flow.py
◦
tasks
▪︎
shared_task_1.py
▪︎
shared_task_2.py
◦
utils
▪︎
shared_lib_1.py
Thanks!
k
Kevin Kho
12/23/2021, 4:11 PM
Hi @Jason Raede, there is none because
cloudpickle
just very recently added support for deep copying of modules. I think we’re unsure if it’ll work at the moment, but it might be possible eventually.
j
Jason Raede
12/23/2021, 4:12 PM
OK, so recommendation for now is
DockerStorage
+ one of the docker agents?
Jason Raede
12/23/2021, 4:12 PM
Or can docker storage work with a local agent?
k
Kevin Kho
12/23/2021, 4:13 PM
Docker storage needs one of the Docker agents I think. But I wanna mention the possibility that you can also use DockerRun + Github Storage/S3 Storage. If your container has all the dependencies and they don’t really change, you can just specify your image and run your Flow on top of it
upvote 1
j
Jason Raede
12/23/2021, 4:15 PM
Got it. Ok, this is helpful, thank you!
👍 1
k
Kirk Quinbar
01/18/2022, 2:55 PM
@Jason Raede I have pretty much this same setup and am trying to figure out the best way to deal with the dependent python files. What did you end up doing to solve your issue?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.