Is there a good example of having classes defined in a separate file and using that class in a task (on Kubernetes and Azure) - I keep getting the message module not found
a
Anna Geller
10/26/2021, 9:25 AM
@Jai Deo last week I’ve built an example you’re looking for! 🙂
Some explanation:
• the flow runs on Kubernetes - thus we need
KubernetesRun
• since you have custom module dependencies, those need to be packaged into a Docker image that gets built and pushed to Azure Container Registry - README of this repo explains how to do that and Dockerfile shows how the image can be built
• the custom package is shown as
Hey @Jai Deo, Anna is right here. All flow storages except for Docker and Module only serialize the flow code, not the dependent modules so you would need to package them in Docker to run on Kubernetes
j
Jai Deo
10/26/2021, 2:15 PM
Hi Kevin, I don't understand - that sounds like I can't store the flow in azure ( if I have separate packages in separate files ?)
k
Kevin Kho
10/26/2021, 2:17 PM
Yeah for Azure Blob storage you can’t because of how
cloudpickle
works where it doesn’t deepcopy the modules. With
cloudpickle 2.0
recently released, there is support for deep copying of modules which might make this possible.
j
Jai Deo
10/26/2021, 2:19 PM
In Anna's example she uses github as storage and specifies the path as an option - azure does'nt have this option does it ?
Jai Deo
10/26/2021, 2:21 PM
Is it possible to use setup.py to create a package and store that ?
k
Kevin Kho
10/26/2021, 2:24 PM
Azure has a path but for both Azure and Github, only the flow file is loaded.
setup.py
in Docker will work. We have a more generic
Git
storage that clones the whole repo and we talked about installing the repo as a module, but it felt like reinventing Python packaging ourselves. If you have the
setup.py
, you can install your dependencies in a Docker container to be used for the flow.
upvote 1
a
Anna Geller
10/26/2021, 2:24 PM
@Jai Deo
• for Python dependencies, you need a Docker image.
• For flow code, you can use e.g. Azure blob storage or Github.
• Alternatively, you can bake your flow code into the image and use it with Docker or Local storage
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.