In old school Prefect not Orion is there a workaround for ad Prefect Community #ask-community

In old-school Prefect (not Orion), is there a work...

Ben Ayers-Glassey

05/05/2022, 3:40 AM

In old-school Prefect (not Orion), is there a workaround for adding build args (i.e. Dockerfile's ARG instead of ENV)? Currently Prefect's

Docker

class takes an

env_vars

dict, but no

build_args

dict, which would be nice. Here is the place in the

Docker

class where ENVs are generated: https://github.com/PrefectHQ/prefect/blob/8e04ccad071d1127afc7ca3580f1fe6f4e884f27/src/prefect/storage/docker.py#L437-L442 ...the ARGs could go right above there. In any case, ultimately what I'm trying to do is get the

Docker

class to install

python_dependencies

from our private PyPi server. Is there a good way to do that?

Kevin Kho

05/05/2022, 3:45 AM

Hey @Ben Ayers-Glassey, I believe you can achieve this like this

Ben Ayers-Glassey

05/05/2022, 4:02 AM

Yeah, I found

build_kwargs

and

buildargs

, and even tried them, but they didn't work... because

buildargs

only works for ARGs in the Dockerfile. You can't set arbitrary env vars during

docker build

, only ones which were specified with ENV or ARG. (The difference being that ENV ones get baked into the image, whereas ARG ones don't -- and are therefore what I want to use to store sensitive PyPi credentials.)

Kevin Kho

05/05/2022, 4:11 AM

I think I understand what you are saying, but I don’t see a way to workaround because we use the dockerpy build under the hood and I don’t see any other place that could help except

buildargs

Kevin Kho

05/05/2022, 4:14 AM

You’re suggesting we just add it ourselves in that Dockerfile we create under the hood right?

Ben Ayers-Glassey

05/05/2022, 4:19 AM

Yeah, and then one can use

build_kwargs

and

buildargs

Ben Ayers-Glassey

05/05/2022, 4:19 AM

So, it's already possible to specify

build_kwargs

and

buildargs

, which is great. There's just no way to add ARG lines to the Dockerfile so that

buildargs

can be useful.

Ben Ayers-Glassey

05/05/2022, 4:21 AM

So basically, the

Docker

class currently has an

env_vars

kwarg, and I think we would just need to add a

build_args

kwarg. Like

env_vars

, it would be a dict which is stored onto `self`; and then in

create_dockerfile_object

, we would just need to copy-paste-modify the 6 lines which generate ENV lines from

self.env_vars

, so that we also generate ARG lines from

self.build_args

. Something like that 🙂

Kevin Kho

05/05/2022, 4:22 AM

I understand. I’d invite an issue and see what the core team says about it. I can write one tom too

👍 1

Ben Ayers-Glassey

05/05/2022, 4:36 AM

Sounds good, thank you! Here's an issue for it: https://github.com/PrefectHQ/prefect/issues/5753

Kevin Kho

05/05/2022, 4:37 AM

Thank you for the well written issue!

🙌 1

Ben Ayers-Glassey

05/05/2022, 4:53 AM

Ah, but I see from the issue you linked to above (https://github.com/PrefectHQ/prefect/issues/5630) that there is a workaround, of supplying your own Dockerfile (e.g.

Docker(dockerfile="Dockerfile")

) instead of relying on it to create the Dockerfile for you. So I guess adding

build_args

or whatever isn't a huge deal, because there's a workaround. I'll add that to the ticket!

Kevin Kho

05/05/2022, 4:53 AM

Yeah you could just do that

👍 1

Mateo Merlo

06/08/2022, 10:30 AM

Hello! I'm coming through this a bit late but I have a similar issue trying to build the image with Google Cloud Build. I need to use some env variables in a flow that I read with this:

Copy code

load_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")

locally I have a .env file but I want to add these env variables into the custom docker image I'm building from Cloud Build. Should I use the --build-arg option in the build step I have in my cloudbuild.yml?

Kevin Kho

06/08/2022, 2:37 PM

Check the first example here . How about something like that where you just add the dotenv file?

Mateo Merlo

06/08/2022, 3:57 PM

I understand the idea but I'm not uploading the .env file into the repo. Every time I push the code in the Google Repo I run this:

Copy code

steps:

  # [START cloudbuild_python_image_yaml]
  # Docker Build
  - name: '<http://gcr.io/cloud-builders/docker|gcr.io/cloud-builders/docker>'
    args: ['build', '-t',
           'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest',
           '--build-arg', 'GCP_BQ_DATASET="${_GCP_BQ_DATASET}"',
           '--cache-from', 'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest', '.']
  # [END cloudbuild_python_image_yaml]

  # [START cloudbuild_python_push_yaml]
  # Docker push to Google Artifact Registry
  - name: '<http://gcr.io/cloud-builders/docker|gcr.io/cloud-builders/docker>'
    args: ['push',  'europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest']
  # [END cloudbuild_python_push_yaml]

  - name: prefecthq/prefect:latest-python3.8
    entrypoint: "bash"
    args:
      - "-c"
      - |
        pip install -e .
        prefect backend cloud
        prefect auth login --key "${_PREFECT__CLOUD__API_KEY}"
        python register_flows.py
        

images:
  - europe-west1-docker.pkg.dev/${PROJECT_ID}/etl/etl-automations:latest
# [END cloudbuild_python_yaml]

This line

'--build-arg', 'GCP_BQ_DATASET="${_GCP_BQ_DATASET}"'

is something that I added (with a SECRET in the Cloud Build called

_GCP_BQ_DATASET

) to try to solve this. I'm using Google Cloud Storage and Vertex to run the custom image:

Copy code

flow.run_config = VertexRun(
            image="europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest",
            labels=["vertex"],
            env={"GCP_BQ_DATASET": "dataset_name"}
)

This

env={"GCP_BQ_DATASET": "dataset_name"}

is something else I've tried but is not working.

Ben Ayers-Glassey

06/08/2022, 6:26 PM

^ My understanding of

--build-arg

is that it sets an environment variable only during the build process. Once the image is built, the build args are not stored in it (unlike env vars set with ENV). So that may be the issue -- that later on, when you're doing

python register_flows.py

, you're expecting that env var to be set, but it's not anymore. It was only set while the image was being built.

Mateo Merlo

06/08/2022, 6:35 PM

Yes, but I set my ENV var in the run_config and I have it in my Dockerfile as well:

Copy code

FROM prefecthq/prefect:latest-python3.8

COPY requirements.txt /app/


RUN pip install --upgrade pip && \
    pip install --no-cache-dir -r app/requirements.txt

COPY . /app/

ARG GCP_BQ_DATASET="staging_data"
ENV GCP_BQ_DATASET=$GCP_BQ_DATASET

WORKDIR /app

RUN pip install -e .

Mateo Merlo

06/08/2022, 6:38 PM

So I thought that if the ENV var is in the Dockerfile, the execution environment that execute the flow with this image, should has this ENV var.

Ben Ayers-Glassey

06/08/2022, 7:13 PM

Hmmm Idunno what happens when you specify something as both an ARG and an ENV. 🤔 But in any case, the fact that you're passing

--build-arg

instead of

--env

-e

or whatever leads me to think that it still won't be saved

Ben Ayers-Glassey

06/08/2022, 7:13 PM

Ah never mind, I see you're copying the ARG to the ENV with this:

Copy code

ENV GCP_BQ_DATASET=$GCP_BQ_DATASET

Ben Ayers-Glassey

06/08/2022, 7:13 PM

🤷 Can you inspect the built image and verify that the ENV is populated as expected?

Ben Ayers-Glassey

06/08/2022, 7:14 PM

docker inspect IMAGE_NAME

dumps a bunch of JSON which I believe includes env vars

Mateo Merlo

06/08/2022, 7:19 PM

the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?

Mateo Merlo

06/08/2022, 7:20 PM

But I see that option it doesn't exists in the docker build command

Mateo Merlo

06/08/2022, 7:22 PM

The line where I create the ENV is there.

Ben Ayers-Glassey

06/08/2022, 8:22 PM

Hmm. It's weird that the env var doesn't show up in the resulting image.

the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?

Not sure... I would have thought that

ENV some_env=default_value

would work, and

--env

would just override the default value. Let's experiment locally...

Ben Ayers-Glassey

06/08/2022, 8:25 PM

Copy code

$ cat Dockerfile 
FROM ubuntu

ENV x=default_x

ARG y_arg=default_y
ENV y_env=$y_arg

$ docker build -t test .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM ubuntu
 ---> d2e4e1f51132
Step 2/4 : ENV x=default_x
 ---> Using cache
 ---> bc492bd30a6e
Step 3/4 : ARG y_arg=default_y
 ---> Using cache
 ---> 2cc1cf87a404
Step 4/4 : ENV y_env=$y_arg
 ---> Using cache
 ---> ab69c58bc43e
Successfully built ab69c58bc43e
Successfully tagged test:latest

$ docker inspect test | jq '.[].Config.Env'
[
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "x=default_x",
  "y_env=default_y"
]

...so the env vars showed up in there for me.

Ben Ayers-Glassey

06/08/2022, 8:25 PM

Including the one which was passed to ENV from an ARG.

Ben Ayers-Glassey

06/08/2022, 8:25 PM

In your case, ENV and ARG used the same env var name; maybe that's the issue?..

Ben Ayers-Glassey

06/08/2022, 8:27 PM

So I was using

ARG y_arg

and

ENV y_env

, but let's try using

for both...

Copy code

$ cat Dockerfile 
FROM ubuntu

ENV x=default_x

ARG y=default_y
ENV y=$y

$ docker build -t test .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM ubuntu
 ---> d2e4e1f51132
Step 2/4 : ENV x=default_x
 ---> Using cache
 ---> bc492bd30a6e
Step 3/4 : ARG y=default_y
 ---> Running in 721a24376962
Removing intermediate container 721a24376962
 ---> 38e614bac6bf
Step 4/4 : ENV y=$y
 ---> Running in 79d9192fb4f4
Removing intermediate container 79d9192fb4f4
 ---> bfc5622b7c51
Successfully built bfc5622b7c51
Successfully tagged test:latest

$ docker inspect test | jq '.[].Config.Env'
[
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  "x=default_x",
  "y=default_y"
]

Ben Ayers-Glassey

06/08/2022, 8:27 PM

^ Nope, that still seems to work as expected

Ben Ayers-Glassey

06/08/2022, 8:28 PM

the ENV var GCP_BQ_DATASET is not there. Do you think I should use --env instead of --build-arg?

Maybe you can try running docker build locally?.. if you really don't see the env var show up in the image, I think it makes sense to focus on that problem. And testing that locally instead of having to go through Google Repo is probably easier? 🤷

Ben Ayers-Glassey

06/08/2022, 8:29 PM

Do you think I should use --env instead of --build-arg?

But I see that option it doesn't exists in the docker build command

Yeah,

--env

is for

docker run

, and

--build-arg

is for `docker build`:

Copy code

$ docker build --help | grep -- --build-arg
      --build-arg list          Set build-time variables

$ docker run --help | grep -- --env
  -e, --env list                       Set environment variables
      --env-file list                  Read in a file of environment variables

Mateo Merlo

06/08/2022, 10:50 PM

Thanks! You are right. Locally:

docker inspect f14d2a683023 | jq '.[].Config.Env'

Mateo Merlo

06/08/2022, 10:51 PM

The ENV var GCP_BQ_DATASET is there.

Ben Ayers-Glassey

06/08/2022, 10:52 PM

Hooray!

Mateo Merlo

06/08/2022, 10:52 PM

Google repo:

docker inspect <http://europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest|europe-west1-docker.pkg.dev/bi-staging/etl/etl-automations:latest> | jq '.[].Config.Env'

The ENV var is not there

Mateo Merlo

06/08/2022, 10:52 PM

Ben Ayers-Glassey

06/08/2022, 10:53 PM

Weird... maybe Google Repo is using a different Docker version? I think ENV was there since the beginning, but ARG was only added later 🤔

Ben Ayers-Glassey

06/08/2022, 10:53 PM

Also, how confident are you that your

docker inspect

is looking at the version of the image generated by Google Repo?

Ben Ayers-Glassey

06/08/2022, 10:53 PM

E.g. maybe you need to

docker pull

again to get the updated version of the image

Ben Ayers-Glassey

06/08/2022, 10:54 PM

Or maybe Google Repo is failing to push once it's done the build

Ben Ayers-Glassey

06/08/2022, 10:54 PM

You could test by adding an

ENV whatever=something

to the Dockerfile, running a Google Repo build, and then using

docker inspect

locally to make sure it shows up

Mateo Merlo

06/08/2022, 11:07 PM

Yes, the image is updated. I'm trying to understand if the env in the flow has to be read in that way. Using:

Copy code

load_dotenv()
GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")

That is correct, right?

Mateo Merlo

06/08/2022, 11:09 PM

And I'm not sure about the --build-arg param in the docker build command? I suppose that if it's only used during build time, is not needed in the command.

Mateo Merlo

06/08/2022, 11:13 PM

You could test by adding an
ENV whatever=something
to the Dockerfile, running a Google Repo build, and then using
docker inspect
locally to make sure it shows up.

You mean build locally and then inspect? or build and push from my CI, and then pull the image and inspect?

Ben Ayers-Glassey

06/08/2022, 11:15 PM

^ by "run a Google Repo build", I guess I probably meant "build and push from CI". I don't actually know anything about Google Build. 😉

Mateo Merlo

06/08/2022, 11:17 PM

simple smile

Ben Ayers-Glassey

06/08/2022, 11:17 PM

The idea being, you tested locally that you could run docker build, and then use docker inspect, and see the env var in the image. But if you still have the issue that, when building/pushing the image via CI, the env var doesn't show up. So I'm wondering if: • something is different about docker on CI vs your local machine, e.g. different docker version • the image which CI is building isn't actually the one you're then inspecting

Mateo Merlo

06/08/2022, 11:20 PM

I will try to continue tomorrow with this (it's 1 am here) and hope to find a solution (some rest often helps with software 😄). I will check what you mentioned and try to do the same with a new test Dockerfile. I don't want to ended up using Prefect Secrets because the idea is understand how env variables works and build this solution correctly using them.

🙌 1

Mateo Merlo

06/08/2022, 11:21 PM

Thanks so much for your help Ben! It was really helpful! 🙌

Ben Ayers-Glassey

06/08/2022, 11:22 PM

And I'm not sure about the --build-arg param in the docker build command? I suppose that if it's only used during build time, is not needed in the command.

Yeah, if you're copying the ARG into an ENV, it seems like there's no point in using ARG. It's basically for cases where you want to have an env var while building, then "erase" it from the resulting image -- e.g. if it's a secret which was only needed to make some API calls while building.

✅ 1

Mateo Merlo

06/08/2022, 11:22 PM

I will share here my advances 😃

🙂 1

Ben Ayers-Glassey

06/08/2022, 11:23 PM

I'm trying to understand if the env in the flow has to be read in that way. Using:

```load_dotenv()

GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")```

That is correct, right?

Not sure about this question. That code looks like it'll do what you want, if it's run somewhere with an .env file. 🤔 But you are using ARG and ENV to set env vars, not an .env file, from what I can see...

Ben Ayers-Glassey

06/08/2022, 11:23 PM

But anyway, if the env var is set by ENV, then

GCP_BQ_DATASET = os.getenv("GCP_BQ_DATASET")

should be enough to get it, without

load_dotenv()

8 Views

Open in Slack

Previous Next