Hi yall. Working through deploying our first flow to production. We have several flows in a reposito...

Jason Raede

12/23/2021, 4:09 PM

Hi yall. Working through deploying our first flow to production. We have several flows in a repository and some shared modules that are imported into each flow that contain tasks and other utility code. From the documentation it seems like the only way to ensure that the shared modules are available at flow execution time is to package them up in a Docker image and use a Docker/ECR/K8S agent. That feels a little heavy - is there any way to package up dependencies like that during pickling? Folder structure is below. The flow needs access to stuff in

src/tasks

and the tasks need access to stuff in

src/utils

•

src

◦

flows

▪︎

my_flow.py

▪︎

my_other_flow.py

◦

tasks

▪︎

shared_task_1.py

▪︎

shared_task_2.py

◦

utils

▪︎

shared_lib_1.py

Thanks!

Kevin Kho

12/23/2021, 4:11 PM

Hi @Jason Raede, there is none because

cloudpickle

just very recently added support for deep copying of modules. I think we’re unsure if it’ll work at the moment, but it might be possible eventually.

Jason Raede

12/23/2021, 4:12 PM

OK, so recommendation for now is

DockerStorage

+ one of the docker agents?

Jason Raede

12/23/2021, 4:12 PM

Or can docker storage work with a local agent?

Kevin Kho

12/23/2021, 4:13 PM

Docker storage needs one of the Docker agents I think. But I wanna mention the possibility that you can also use DockerRun + Github Storage/S3 Storage. If your container has all the dependencies and they don’t really change, you can just specify your image and run your Flow on top of it

upvote 1

Jason Raede

12/23/2021, 4:15 PM

Got it. Ok, this is helpful, thank you!

👍 1

Kirk Quinbar

01/18/2022, 2:55 PM

@Jason Raede I have pretty much this same setup and am trying to figure out the best way to deal with the dependent python files. What did you end up doing to solve your issue?

5 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.