Ben Ayers-Glassey
05/05/2022, 3:40 AMDocker
class takes an env_vars
dict, but no build_args
dict, which would be nice.
Here is the place in the Docker
class where ENVs are generated:
https://github.com/PrefectHQ/prefect/blob/8e04ccad071d1127afc7ca3580f1fe6f4e884f27/src/prefect/storage/docker.py#L437-L442
...the ARGs could go right above there.
In any case, ultimately what I'm trying to do is get the Docker
class to install python_dependencies
from our private PyPi server.
Is there a good way to do that?Kevin Kho
Ben Ayers-Glassey
05/05/2022, 4:02 AMbuild_kwargs
and buildargs
, and even tried them, but they didn't work... because buildargs
only works for ARGs in the Dockerfile. You can't set arbitrary env vars during docker build
, only ones which were specified with ENV or ARG.
(The difference being that ENV ones get baked into the image, whereas ARG ones don't -- and are therefore what I want to use to store sensitive PyPi credentials.)Kevin Kho
buildargs
Ben Ayers-Glassey
05/05/2022, 4:19 AMbuild_kwargs
and buildargs
.build_kwargs
and buildargs
, which is great. There's just no way to add ARG lines to the Dockerfile so that buildargs
can be useful.Docker
class currently has an env_vars
kwarg, and I think we would just need to add a build_args
kwarg. Like env_vars
, it would be a dict which is stored onto `self`; and then in create_dockerfile_object
, we would just need to copy-paste-modify the 6 lines which generate ENV lines from self.env_vars
, so that we also generate ARG lines from self.build_args
.
Something like that 🙂Kevin Kho
Ben Ayers-Glassey
05/05/2022, 4:36 AMKevin Kho
Ben Ayers-Glassey
05/05/2022, 4:53 AMDocker(dockerfile="Dockerfile")
) instead of relying on it to create the Dockerfile for you.
So I guess adding build_args
or whatever isn't a huge deal, because there's a workaround.
I'll add that to the ticket!Kevin Kho
Mateo Merlo
06/08/2022, 10:30 AMload_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")
locally I have a .env file but I want to add these env variables into the custom docker image I'm building from Cloud Build. Should I use the --build-arg option in the build step I have in my cloudbuild.yml?Kevin Kho
Mateo Merlo
06/08/2022, 3:57 PMsteps:
# [START cloudbuild_python_image_yaml]
# Docker Build
- name: '<http://gcr.io/cloud-builders/docker|gcr.io/cloud-builders/docker>'
args: ['build', '-t',
'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest',
'--build-arg', 'GCP_BQ_DATASET="${_GCP_BQ_DATASET}"',
'--cache-from', 'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest', '.']
# [END cloudbuild_python_image_yaml]
# [START cloudbuild_python_push_yaml]
# Docker push to Google Artifact Registry
- name: '<http://gcr.io/cloud-builders/docker|gcr.io/cloud-builders/docker>'
args: ['push', 'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest']
# [END cloudbuild_python_push_yaml]
- name: prefecthq/prefect:latest-python3.8
entrypoint: "bash"
args:
- "-c"
- |
pip install -e .
prefect backend cloud
prefect auth login --key "${_PREFECT__CLOUD__API_KEY}"
python register_flows.py
images:
- europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest
# [END cloudbuild_python_yaml]
This line '--build-arg', 'GCP_BQ_DATASET="${_GCP_BQ_DATASET}"'
is something that I added (with a SECRET in the Cloud Build called _GCP_BQ_DATASET
) to try to solve this.
I'm using Google Cloud Storage and Vertex to run the custom image:
flow.run_config = VertexRun(
image="europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest",
labels=["vertex"],
env={"GCP_BQ_DATASET": "dataset_name"}
)
This env={"GCP_BQ_DATASET": "dataset_name"}
is something else I've tried but is not working.Ben Ayers-Glassey
06/08/2022, 6:26 PM--build-arg
is that it sets an environment variable only during the build process.
Once the image is built, the build args are not stored in it (unlike env vars set with ENV).
So that may be the issue -- that later on, when you're doing python register_flows.py
, you're expecting that env var to be set, but it's not anymore. It was only set while the image was being built.Mateo Merlo
06/08/2022, 6:35 PMFROM prefecthq/prefect:latest-python3.8
COPY requirements.txt /app/
RUN pip install --upgrade pip && \
pip install --no-cache-dir -r app/requirements.txt
COPY . /app/
ARG GCP_BQ_DATASET="staging_data"
ENV GCP_BQ_DATASET=$GCP_BQ_DATASET
WORKDIR /app
RUN pip install -e .
Ben Ayers-Glassey
06/08/2022, 7:13 PM--build-arg
instead of --env
or -e
or whatever leads me to think that it still won't be savedENV GCP_BQ_DATASET=$GCP_BQ_DATASET
docker inspect IMAGE_NAME
dumps a bunch of JSON which I believe includes env varsMateo Merlo
06/08/2022, 7:19 PMBen Ayers-Glassey
06/08/2022, 8:22 PMthe ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?Not sure... I would have thought that
ENV some_env=default_value
would work, and --env
would just override the default value.
Let's experiment locally...$ cat Dockerfile
FROM ubuntu
ENV x=default_x
ARG y_arg=default_y
ENV y_env=$y_arg
$ docker build -t test .
Sending build context to Docker daemon 2.048kB
Step 1/4 : FROM ubuntu
---> d2e4e1f51132
Step 2/4 : ENV x=default_x
---> Using cache
---> bc492bd30a6e
Step 3/4 : ARG y_arg=default_y
---> Using cache
---> 2cc1cf87a404
Step 4/4 : ENV y_env=$y_arg
---> Using cache
---> ab69c58bc43e
Successfully built ab69c58bc43e
Successfully tagged test:latest
$ docker inspect test | jq '.[].Config.Env'
[
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"x=default_x",
"y_env=default_y"
]
...so the env vars showed up in there for me.ARG y_arg
and ENV y_env
, but let's try using y
for both...
$ cat Dockerfile
FROM ubuntu
ENV x=default_x
ARG y=default_y
ENV y=$y
$ docker build -t test .
Sending build context to Docker daemon 2.048kB
Step 1/4 : FROM ubuntu
---> d2e4e1f51132
Step 2/4 : ENV x=default_x
---> Using cache
---> bc492bd30a6e
Step 3/4 : ARG y=default_y
---> Running in 721a24376962
Removing intermediate container 721a24376962
---> 38e614bac6bf
Step 4/4 : ENV y=$y
---> Running in 79d9192fb4f4
Removing intermediate container 79d9192fb4f4
---> bfc5622b7c51
Successfully built bfc5622b7c51
Successfully tagged test:latest
$ docker inspect test | jq '.[].Config.Env'
[
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"x=default_x",
"y=default_y"
]
the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?Maybe you can try running docker build locally?.. if you really don't see the env var show up in the image, I think it makes sense to focus on that problem. And testing that locally instead of having to go through Google Repo is probably easier? 🤷
Do you think I should use --env instead of --build-arg?
But I see that option it doesn't exists in the docker build commandYeah,
--env
is for docker run
, and --build-arg
is for `docker build`:
$ docker build --help | grep -- --build-arg
--build-arg list Set build-time variables
$ docker run --help | grep -- --env
-e, --env list Set environment variables
--env-file list Read in a file of environment variables
Mateo Merlo
06/08/2022, 10:50 PMdocker inspect f14d2a683023 | jq '.[].Config.Env'
Ben Ayers-Glassey
06/08/2022, 10:52 PMMateo Merlo
06/08/2022, 10:52 PMdocker inspect <http://europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest|europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest> | jq '.[].Config.Env'
The ENV var is not thereBen Ayers-Glassey
06/08/2022, 10:53 PMdocker inspect
is looking at the version of the image generated by Google Repo?docker pull
again to get the updated version of the imageENV whatever=something
to the Dockerfile, running a Google Repo build, and then using docker inspect
locally to make sure it shows upMateo Merlo
06/08/2022, 11:07 PMload_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")
That is correct, right?You could test by adding anYou mean build locally and then inspect? or build and push from my CI, and then pull the image and inspect?to the Dockerfile, running a Google Repo build, and then usingENV whatever=something
locally to make sure it shows up.docker inspect
Ben Ayers-Glassey
06/08/2022, 11:15 PMMateo Merlo
06/08/2022, 11:17 PMBen Ayers-Glassey
06/08/2022, 11:17 PMMateo Merlo
06/08/2022, 11:20 PMBen Ayers-Glassey
06/08/2022, 11:22 PMAnd I'm not sure about the --build-arg param in the docker build command? I suppose that if it's only used during build time, is not needed in the command.Yeah, if you're copying the ARG into an ENV, it seems like there's no point in using ARG. It's basically for cases where you want to have an env var while building, then "erase" it from the resulting image -- e.g. if it's a secret which was only needed to make some API calls while building.
Mateo Merlo
06/08/2022, 11:22 PMBen Ayers-Glassey
06/08/2022, 11:23 PMI'm trying to understand if the env in the flow has to be read in that way. Using:
```load_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")```
That is correct, right?Not sure about this question. That code looks like it'll do what you want, if it's run somewhere with an .env file. 🤔 But you are using ARG and ENV to set env vars, not an .env file, from what I can see...
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")
should be enough to get it, without load_dotenv()
!