In old-school Prefect (not Orion), is there a work...
# prefect-community
b
In old-school Prefect (not Orion), is there a workaround for adding build args (i.e. Dockerfile's ARG instead of ENV)? Currently Prefect's
Docker
class takes an
env_vars
dict, but no
build_args
dict, which would be nice. Here is the place in the
Docker
class where ENVs are generated: https://github.com/PrefectHQ/prefect/blob/8e04ccad071d1127afc7ca3580f1fe6f4e884f27/src/prefect/storage/docker.py#L437-L442 ...the ARGs could go right above there. In any case, ultimately what I'm trying to do is get the
Docker
class to install
python_dependencies
from our private PyPi server. Is there a good way to do that?
k
Hey @Ben Ayers-Glassey, I believe you can achieve this like this
b
Yeah, I found
build_kwargs
and
buildargs
, and even tried them, but they didn't work... because
buildargs
only works for ARGs in the Dockerfile. You can't set arbitrary env vars during
docker build
, only ones which were specified with ENV or ARG. (The difference being that ENV ones get baked into the image, whereas ARG ones don't -- and are therefore what I want to use to store sensitive PyPi credentials.)
k
I think I understand what you are saying, but I don’t see a way to workaround because we use the dockerpy build under the hood and I don’t see any other place that could help except
buildargs
You’re suggesting we just add it ourselves in that Dockerfile we create under the hood right?
b
Yeah, and then one can use
build_kwargs
and
buildargs
.
So, it's already possible to specify
build_kwargs
and
buildargs
, which is great. There's just no way to add ARG lines to the Dockerfile so that
buildargs
can be useful.
So basically, the
Docker
class currently has an
env_vars
kwarg, and I think we would just need to add a
build_args
kwarg. Like
env_vars
, it would be a dict which is stored onto `self`; and then in
create_dockerfile_object
, we would just need to copy-paste-modify the 6 lines which generate ENV lines from
self.env_vars
, so that we also generate ARG lines from
self.build_args
. Something like that 🙂
k
I understand. I’d invite an issue and see what the core team says about it. I can write one tom too
👍 1
b
Sounds good, thank you! Here's an issue for it: https://github.com/PrefectHQ/prefect/issues/5753
k
Thank you for the well written issue!
🙌 1
b
Ah, but I see from the issue you linked to above (https://github.com/PrefectHQ/prefect/issues/5630) that there is a workaround, of supplying your own Dockerfile (e.g.
Docker(dockerfile="Dockerfile")
) instead of relying on it to create the Dockerfile for you. So I guess adding
build_args
or whatever isn't a huge deal, because there's a workaround. I'll add that to the ticket!
k
Yeah you could just do that
👍 1
m
Hello! I'm coming through this a bit late but I have a similar issue trying to build the image with Google Cloud Build. I need to use some env variables in a flow that I read with this:
Copy code
load_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")
locally I have a .env file but I want to add these env variables into the custom docker image I'm building from Cloud Build. Should I use the --build-arg option in the build step I have in my cloudbuild.yml?
k
Check the first example here . How about something like that where you just add the dotenv file?
m
I understand the idea but I'm not uploading the .env file into the repo. Every time I push the code in the Google Repo I run this:
Copy code
steps:

  # [START cloudbuild_python_image_yaml]
  # Docker Build
  - name: '<http://gcr.io/cloud-builders/docker|gcr.io/cloud-builders/docker>'
    args: ['build', '-t',
           'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest',
           '--build-arg', 'GCP_BQ_DATASET="${_GCP_BQ_DATASET}"',
           '--cache-from', 'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest', '.']
  # [END cloudbuild_python_image_yaml]

  # [START cloudbuild_python_push_yaml]
  # Docker push to Google Artifact Registry
  - name: '<http://gcr.io/cloud-builders/docker|gcr.io/cloud-builders/docker>'
    args: ['push',  'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest']
  # [END cloudbuild_python_push_yaml]

  - name: prefecthq/prefect:latest-python3.8
    entrypoint: "bash"
    args:
      - "-c"
      - |
        pip install -e .
        prefect backend cloud
        prefect auth login --key "${_PREFECT__CLOUD__API_KEY}"
        python register_flows.py
        

images:
  - europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest
# [END cloudbuild_python_yaml]
This line
'--build-arg', 'GCP_BQ_DATASET="${_GCP_BQ_DATASET}"'
is something that I added (with a SECRET in the Cloud Build called
_GCP_BQ_DATASET
) to try to solve this. I'm using Google Cloud Storage and Vertex to run the custom image:
Copy code
flow.run_config = VertexRun(
            image="europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest",
            labels=["vertex"],
            env={"GCP_BQ_DATASET": "dataset_name"}
)
This
env={"GCP_BQ_DATASET": "dataset_name"}
is something else I've tried but is not working.
b
^ My understanding of
--build-arg
is that it sets an environment variable only during the build process. Once the image is built, the build args are not stored in it (unlike env vars set with ENV). So that may be the issue -- that later on, when you're doing
python register_flows.py
, you're expecting that env var to be set, but it's not anymore. It was only set while the image was being built.
m
Yes, but I set my ENV var in the run_config and I have it in my Dockerfile as well:
Copy code
FROM prefecthq/prefect:latest-python3.8

COPY requirements.txt /app/


RUN pip install --upgrade pip && \
    pip install --no-cache-dir -r app/requirements.txt

COPY . /app/

ARG GCP_BQ_DATASET="staging_data"
ENV GCP_BQ_DATASET=$GCP_BQ_DATASET

WORKDIR /app

RUN pip install -e .
So I thought that if the ENV var is in the Dockerfile, the execution environment that execute the flow with this image, should has this ENV var.
b
Hmmm Idunno what happens when you specify something as both an ARG and an ENV. 🤔 But in any case, the fact that you're passing
--build-arg
instead of
--env
or
-e
or whatever leads me to think that it still won't be saved
Ah never mind, I see you're copying the ARG to the ENV with this:
Copy code
ENV GCP_BQ_DATASET=$GCP_BQ_DATASET
🤷 Can you inspect the built image and verify that the ENV is populated as expected?
docker inspect IMAGE_NAME
dumps a bunch of JSON which I believe includes env vars
m
the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?
But I see that option it doesn't exists in the docker build command
The line where I create the ENV is there.
b
Hmm. It's weird that the env var doesn't show up in the resulting image.
the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?
Not sure... I would have thought that
ENV some_env=default_value
would work, and
--env
would just override the default value. Let's experiment locally...
Copy code
$ cat Dockerfile 
FROM ubuntu

ENV x=default_x

ARG y_arg=default_y
ENV y_env=$y_arg

$ docker build -t test .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM ubuntu
 ---> d2e4e1f51132
Step 2/4 : ENV x=default_x
 ---> Using cache
 ---> bc492bd30a6e
Step 3/4 : ARG y_arg=default_y
 ---> Using cache
 ---> 2cc1cf87a404
Step 4/4 : ENV y_env=$y_arg
 ---> Using cache
 ---> ab69c58bc43e
Successfully built ab69c58bc43e
Successfully tagged test:latest

$ docker inspect test | jq '.[].Config.Env'
[
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "x=default_x",
  "y_env=default_y"
]
...so the env vars showed up in there for me.
Including the one which was passed to ENV from an ARG.
In your case, ENV and ARG used the same env var name; maybe that's the issue?..
So I was using
ARG y_arg
and
ENV y_env
, but let's try using
y
for both...
Copy code
$ cat Dockerfile 
FROM ubuntu

ENV x=default_x

ARG y=default_y
ENV y=$y

$ docker build -t test .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM ubuntu
 ---> d2e4e1f51132
Step 2/4 : ENV x=default_x
 ---> Using cache
 ---> bc492bd30a6e
Step 3/4 : ARG y=default_y
 ---> Running in 721a24376962
Removing intermediate container 721a24376962
 ---> 38e614bac6bf
Step 4/4 : ENV y=$y
 ---> Running in 79d9192fb4f4
Removing intermediate container 79d9192fb4f4
 ---> bfc5622b7c51
Successfully built bfc5622b7c51
Successfully tagged test:latest

$ docker inspect test | jq '.[].Config.Env'
[
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "x=default_x",
  "y=default_y"
]
^ Nope, that still seems to work as expected
the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?
Maybe you can try running docker build locally?.. if you really don't see the env var show up in the image, I think it makes sense to focus on that problem. And testing that locally instead of having to go through Google Repo is probably easier? 🤷
Do you think I should use --env instead of --build-arg?
But I see that option it doesn't exists in the docker build command
Yeah,
--env
is for
docker run
, and
--build-arg
is for `docker build`:
Copy code
$ docker build --help | grep -- --build-arg
      --build-arg list          Set build-time variables

$ docker run --help | grep -- --env
  -e, --env list                       Set environment variables
      --env-file list                  Read in a file of environment variables
m
Thanks! You are right. Locally:
docker inspect f14d2a683023 | jq '.[].Config.Env'
The ENV var GCP_BQ_DATASET is there.
b
Hooray!
m
Google repo:
docker inspect <http://europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest|europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest> | jq '.[].Config.Env'
The ENV var is not there
b
Weird... maybe Google Repo is using a different Docker version? I think ENV was there since the beginning, but ARG was only added later 🤔
Also, how confident are you that your
docker inspect
is looking at the version of the image generated by Google Repo?
E.g. maybe you need to
docker pull
again to get the updated version of the image
Or maybe Google Repo is failing to push once it's done the build
You could test by adding an
ENV whatever=something
to the Dockerfile, running a Google Repo build, and then using
docker inspect
locally to make sure it shows up
m
Yes, the image is updated. I'm trying to understand if the env in the flow has to be read in that way. Using:
Copy code
load_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")
That is correct, right?
And I'm not sure about the --build-arg param in the docker build command? I suppose that if it's only used during build time, is not needed in the command.
You could test by adding an
ENV whatever=something
to the Dockerfile, running a Google Repo build, and then using
docker inspect
locally to make sure it shows up.
You mean build locally and then inspect? or build and push from my CI, and then pull the image and inspect?
b
^ by "run a Google Repo build", I guess I probably meant "build and push from CI". I don't actually know anything about Google Build. 😉
m
simple smile
b
The idea being, you tested locally that you could run docker build, and then use docker inspect, and see the env var in the image. But if you still have the issue that, when building/pushing the image via CI, the env var doesn't show up. So I'm wondering if: • something is different about docker on CI vs your local machine, e.g. different docker version • the image which CI is building isn't actually the one you're then inspecting
m
I will try to continue tomorrow with this (it's 1 am here) and hope to find a solution (some rest often helps with software 😄). I will check what you mentioned and try to do the same with a new test Dockerfile. I don't want to ended up using Prefect Secrets because the idea is understand how env variables works and build this solution correctly using them.
🙌 1
Thanks so much for your help Ben! It was really helpful! 🙌
b
And I'm not sure about the --build-arg param in the docker build command? I suppose that if it's only used during build time, is not needed in the command.
Yeah, if you're copying the ARG into an ENV, it seems like there's no point in using ARG. It's basically for cases where you want to have an env var while building, then "erase" it from the resulting image -- e.g. if it's a secret which was only needed to make some API calls while building.
1
m
I will share here my advances 😃
🙂 1
b
I'm trying to understand if the env in the flow has to be read in that way. Using:
```load_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")```
That is correct, right?
Not sure about this question. That code looks like it'll do what you want, if it's run somewhere with an .env file. 🤔 But you are using ARG and ENV to set env vars, not an .env file, from what I can see...
But anyway, if the env var is set by ENV, then
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")
should be enough to get it, without
load_dotenv()
!