I have a private Python package bundled up and stored in Google Artifact Registry (as opposed to an open-source package on PyPI). I'd like to include it in my flow, which uses a Docker storage instance in production. I know that I can use the
python_dependencies
kwarg to the class to include open-source packages. But how do I get a private package from Google Artifact Registry included as well? Is there an established pattern for this?
🙌 1
k
Kevin Kho
02/14/2022, 8:43 PM
So this is happening because you are using the
DockerStorage
as the interface to build the image. If it’s too limiting, you can supply your own Dockerfile or image and just handle it yourself.
Kevin Kho
02/14/2022, 8:44 PM
But for this, I think there is a chance we can get it to work by pointing your
pip
to the Artifact Registry. Each one of the dependencies are just pip installed here by adding commands to the container
Kevin Kho
02/14/2022, 8:44 PM
So this will work if doiing
pip install some_library"
magically worked. I’ll look a bit
Kevin Kho
02/14/2022, 8:44 PM
You can also add extra commands to the Docker build if ever we can’t pip install it
should do the trick - you can add it to your Dockerfile like so:
Copy code
RUN pip install --index-url <https://LOCATION-python.pkg.dev/PROJECT/REPOSITORY/simple/> PACKAGE
But when you register your flow and build the image, your terminal must be authenticated with Artifact Registry Reader permissions.
In my last job we were also using private artifactory and you could also just add the index URL at the top of your
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.