Guys, how can I reduce docker image size? itโ€™s ast...
# ask-community
s
Guys, how can I reduce docker image size? itโ€™s astronomical almost 2Gb
c
Hi Simonas, that's surprising for sure. What architecture are you on? Dockerhub shows that the image is ~215MB compressed, and when I pull it, it looks like it's ~671MB uncompressed:
Copy code
$ docker image ls prefecthq/prefect:2.19*
REPOSITORY          TAG                 IMAGE ID       CREATED      SIZE
prefecthq/prefect   2.19.2-python3.12   b8d1dcb5eb14   5 days ago   671MB
I'm on an amd64 architecture here, how about you?
s
mac m1
hey ๐Ÿ™‚
๐Ÿ‘‹ 1
I had k8s job start time ~2minutes, then started to debug why
c
Do you mind sharing the output of
docker image inspect prefecthq/prefect:2.19.2-python3.12
from your system?
s
Copy code
docker image inspect prefecthq/prefect:2.19.2-python3.12
[
    {
        "Id": "sha256:320e98a341628635b59f3a15533d49ba7eb1f95995a4f04051b3e39a5e0f0ebd",
        "RepoTags": [
            "prefecthq/prefect:2.19.2-python3.12"
        ],
        "RepoDigests": [
            "prefecthq/prefect@sha256:320e98a341628635b59f3a15533d49ba7eb1f95995a4f04051b3e39a5e0f0ebd"
        ],
        "Parent": "",
        "Comment": "buildkit.dockerfile.v0",
        "Created": "2024-05-23T20:42:43.418121443Z",
        "ContainerConfig": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": null,
            "Cmd": null,
            "Image": "",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "DockerVersion": "26.1.1",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "LANG=C.UTF-8",
                "GPG_KEY=7169605F62C751356D054A26A821E680E5FA6305",
                "PYTHON_VERSION=3.12.3",
                "PYTHON_PIP_VERSION=24.0",
                "PYTHON_GET_PIP_URL=<https://github.com/pypa/get-pip/raw/dbf0c85f76fb6e1ab42aa672ffca6f0a675d9ee4/public/get-pip.py>",
                "PYTHON_GET_PIP_SHA256=dfe9fd5c28dc98b5ac17979a953ea550cec37ae1b47a5116007395bfacff2ab9",
                "LC_ALL=C.UTF-8"
            ],
            "Cmd": null,
            "ArgsEscaped": true,
            "Image": "",
            "Volumes": null,
            "WorkingDir": "/opt/prefect",
            "Entrypoint": [
                "/usr/bin/tini",
                "-g",
                "--",
                "/opt/prefect/entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "io.prefect.python-version": "3.12.3",
                "maintainer": "<mailto:help@prefect.io|help@prefect.io>",
                "org.label-schema.name": "prefect",
                "org.label-schema.schema-version": "= 1.0",
                "org.label-schema.url": "<https://www.prefect.io/>"
            }
        },
        "Architecture": "arm64",
        "Os": "linux",
        "Size": 217847620,
        "GraphDriver": {
            "Data": null,
            "Name": "stargz"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:2bd1a2222589b50b52ff960c3d004829633df61532e7a670a91618cd775f2d47",
                "sha256:e8a6046370e74bc223a160c93d52211667439f18cc90cdfb0243d4a632db90b9",
                "sha256:291b21563fd4c6487c270c885bbad3515be769561455e67ed0b60d96743ecf25",
                "sha256:9e14e2775382594c54bb6e3583819618f40825b2c43d6d1722981d1513170801",
                "sha256:ab71a4ac32c1efb2b57a9b3e636ed0b97253f0ff4a86df74dca409d9d8edc779",
                "sha256:73ade58fd89b151b8a1360023d6e11d514a7b58fc736ab758efc16bceb61f70a",
                "sha256:de3f65ae0c8789df3f7ea17c4fb0a21a9f1e46b8e302c9f7a138360a1661354c",
                "sha256:27af10708daedcc3b0c1e80dfb31b782f499b86ca744c744bbf98a8092b4c624",
                "sha256:3dff399ad379f60e361cf66534532a14ce598f2a8264625948b4bdef92c2ee2e",
                "sha256:80b301eda11b72c7f8c6d072be5afac76a8ac2828d8995b326839f5f7a4da46a",
                "sha256:b1f72ce46c6550de5ae3f915850a287479fce5edd28cc1dc62229353b32683da",
                "sha256:5afb03f61dc7ac1a10d8790c70aab9f6f4b57c9652915395b43f1941e6033ee9",
                "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
                "sha256:31b2e017b279e85f52506ea19e8dd3d59d61a48b3c977f09b0bb8dcc431e3f20",
                "sha256:8cea6e634d67cf684036292c2c5b66e403f8ec634cab5324f6373e34c8332d8c"
            ]
        },
        "Metadata": {
            "LastTagTime": "2024-05-29T09:39:58.851620918Z"
        },
        "Container": ""
    }
c
ah okay, you've got the arm64 image, pulling that now for comparison
Even with the arm64 image, I'm still seeing just 678MB:
Copy code
$ docker image ls prefecthq/prefect
REPOSITORY          TAG                            IMAGE ID       CREATED         SIZE
prefecthq/prefect   2.19.2-python3.12              5b0d9b3e4d7b   5 days ago      678MB
prefecthq/prefect   <none>                         b8d1dcb5eb14   5 days ago      671MB
That's very surprising. I'll consult with some colleagues on Mac m1s to see what they're getting
s
thank you!
c
This is very interesting: in your output above, the size is listed as ~217MB
Copy code
"Size": 217847620,
But your
docker image ls
output shows 2GB? That's odd
s
c
Simonas, can you share which version of docker you're running locally?
s
Copy code
docker --version
Docker version 26.1.1, build 4cf5afa
c
Okay that's the same as me
Sorry, Simonas, I'm at a loss here. When I pull the 2.19.2 images from Dockerhub, I see them in the ~670MB range for both
amd64
and
arm64
. My colleague on an M1 is also seeing the same results. Are you pulling from a different registry than the public Docker index?
I wonder if your client is mis-reporting the image size in
docker image ls
for some reason? What it's showing there doesn't agree with the
Size
from your
docker image inspect
output either ๐Ÿค”
s
same ๐Ÿ™‚
c
That is so strange. Let's try this with the base Python image for comparison. Let's do:
Copy code
$ docker pull python:3.12 && docker image ls python:3.12
Here's what I get:
Copy code
$ docker pull python:3.12 && docker image ls python:3.12
3.12: Pulling from library/python
c6cf28de8a06: Pull complete 
891494355808: Pull complete 
6582c62583ef: Pull complete 
bf2c3e352f3d: Pull complete 
a99509a32390: Pull complete 
d46a03def8d9: Pull complete 
4429b810e09e: Pull complete 
2a4ca5af09fa: Pull complete 
Digest: sha256:3966b81808d864099f802080d897cef36c01550472ab3955fdd716d1c665acd6
Status: Downloaded newer image for python:3.12
<http://docker.io/library/python:3.12|docker.io/library/python:3.12>
REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
python       3.12      12e5ab9d51c8   7 weeks ago   1.02GB
Prefect's image is actually based on the
-slim
variant:
Copy code
$ docker pull python:3.12-slim && docker image ls python:3.12-slim
3.12-slim: Pulling from library/python
09f376ebb190: Already exists 
276709cbedc1: Already exists 
2e133733af76: Already exists 
ded8879d9a79: Already exists 
3cf9507408dc: Already exists 
Digest: sha256:afc139a0a640942491ec481ad8dda10f2c5b753f5c969393b12480155fe15a63
Status: Downloaded newer image for python:3.12-slim
<http://docker.io/library/python:3.12-slim|docker.io/library/python:3.12-slim>
REPOSITORY   TAG         IMAGE ID       CREATED       SIZE
python       3.12-slim   cf001c2f8af7   7 weeks ago   130MB
s
c
Can you try the
-slim
variation too?
s
Copy code
docker pull python:3.12-slim && docker image ls python:3.12-slim
3.12-slim: Pulling from library/python
Digest: sha256:afc139a0a640942491ec481ad8dda10f2c5b753f5c969393b12480155fe15a63
Status: Downloaded newer image for python:3.12-slim
<http://docker.io/library/python:3.12-slim|docker.io/library/python:3.12-slim>

What's Next?
  View a summary of image vulnerabilities and recommendations โ†’ docker scout quickview python:3.12-slim
REPOSITORY   TAG         IMAGE ID       CREATED       SIZE
python       3.12-slim   afc139a0a640   7 weeks ago   221MB
c
Okay this is fascinating, you're seeing a 2x larger image size there (130 vs 221)
s
๐Ÿ™ˆ
c
Sorry, Simonas, this doesn't seem to be specific to the Prefect image. I'm wondering about your deployed environment. Perhaps there's something unusual about your local macOS environment, but do you have enough access to your k8s cluster to inspect the images on it? I'm thinking like this: https://kubernetes.io/docs/tasks/access-application-cluster/list-all-running-container-images/
s
offtopic any ideas, Iโ€™m using alpine base image and installing prefect from poetry
c
Ah that looks like you might need to add a timezone info package from the alpine repos, one sec...
s
thanks!, so I have prefect.yaml
Copy code
# Generic metadata about this project
name: super-research
prefect-version: 2.19.2

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.set_working_directory:
    directory: /app

# the definitions section allows you to define reusable components for your deployments
definitions:
  tags: &common_tags
  - eks
  - '{{ get-commit-hash.stdout }}'
  work_pool: &common_work_pool
    name: test-worker-pool
    job_variables:
      image: '{{ $IMAGE_URI }}'


# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: run_agent
  tags: *common_tags
  entrypoint: src/research/services/agents/base.py:run_agent
  work_pool: *common_work_pool
  version:
  description:
  parameters: {}
  schedules: []
and
run_agent
function with
@flow
decorator should trigger the k8s job? However, I saw that it was not triggered in the k8s worker. Is it possible that it ran on prefect cloud , but not on k8s worker?
it was not triggered*
when I manually trigger the deployment it runs on k8s, and crashes because of the error above, but when application triggers the decorated function it runs not in the workpool
c
If your application called
run_agent
as a regular Python function, then that flow will just run locally within that same process:
Copy code
@flow
def my_flow():
    print('hello')
Copy code
def my_other_application():
    my_flow()
If this is what's happening, then the flow will run in the process of your application. If you want to trigger the deployment, try this:
Copy code
from prefect.deployments import run_deployment

def my_other_application():
    run_deployment(name="my_flow/my_deployment")
s
thanks!, will try
Ok, so if I want my flow to run on k8s worker, I have to do the above
c
That's right! If you want a flow to run on a specific infrastructure, that's determined by one or more Deployments for that flow
s
can I pass payload to the deployment?
run_deployment(name="my_flow_name/my_deployment_name")
api call -> payload -> worker -> processing
c
Yes for sure, you can pass parameters like:
Copy code
run_deployment(name="...", parameters={"hello": "world"})
s
thatโ€™s great!
c
Copy code
async def run_deployment(
    name: Union[str, UUID],
    client: Optional["PrefectClient"] = None,
    parameters: Optional[dict] = None,
    scheduled_time: Optional[datetime] = None,
    flow_run_name: Optional[str] = None,
    timeout: Optional[float] = None,
    poll_interval: Optional[float] = 5,
    tags: Optional[Iterable[str]] = None,
    idempotency_key: Optional[str] = None,
    work_queue_name: Optional[str] = None,
    as_subflow: Optional[bool] = True,
    infra_overrides: Optional[dict] = None,
    job_variables: Optional[dict] = None,
) -> "FlowRun":
๐Ÿ™Œ 1
s
thanks @Chris Guidry it solved the docker issue, how can I pass env variables and secrets to k8s job? like:
Copy code
envFrom:
            -
              secretRef:
                name: app-secrets            -
              configMapRef:
                name: app-envs
ideally would have job template which I can edit with Kustomize together with application templates
c
Excellent! So that's generally something you define on your worker and you can do that when you configure your work pool: https://docs.prefect.io/latest/guides/prefect-deploy/ If you need to change environment variables on a per-flow-run basis, you can do that with something called `job_variables`: https://docs.prefect.io/latest/guides/deployment/overriding-job-variables/
s
so if I edit
prefect-worker/templates/deployment.yaml
with new env and secrets it should propagate them to the job, right?
I did helm dry run to get the templates
c
That would let you adjust the variables/secrets for your worker but the worker doesn't usually run the flow runs, it creates K8s pods that do. All of the configuration of those K8s pods is coming from your work pool settings
s
yes
how can I edit the job template?
c
s
is it possible to have kubernetes manifest?
or should I pull from the cluster, patch it and load it?
c
You can modify the work pool's template via the Prefect UI
s
yes, I understand this, my question is if I can directly edit the job manifest, I want to setup patches in the CI with Kustomize since I share the same image for application and for worker
c
Under the
Advanced
tab in the UI, you can modify the entire job manifest for the work pool (this will apply to every job that's created to run a flow), and you can also do this via the API if you want to modify this during a CI process
You may also want to check out
job_variables
which are per-flow-run adjustments you can make to the job template as well
s
will check this, thank you, Chris!
highfive 1
this what I was looking
Copy code
prefect work-pool update --base-job-template base-job-template.json my-work-pool
so I can keep
base-job-template.json
in the git repo
c
Ah yes!
s
@Chris Guidry maybe you know what could be the issue with github runners. When I deploy locally it works well, when I run the same script from the github actions I got:
Could not find flow 'start_research' in 'src/research/router.py'
Copy code
$ cat src/research/router.py | grep start_research

@flow(name="start_research")
async def start_research(
        f"deployment: {f'start_research/research-{app_settings.environment}'}"
        name=f"start_research/research-{app_settings.environment}",
my actions:
Copy code
- name: Run Prefect Deploy
      run: |
        pip install prefect
        echo "Updated Prefect envs:"
        echo "PREFECT_API_KEY=$PREFECT_API_KEY"
        echo "PREFECT_CLOUD_URL=$PREFECT_CLOUD_URL"

        prefect --no-prompt cloud login --key $PREFECT_API_KEY --workspace ismailsuperagentsh/superagent
        prefect deployment ls
        prefect --no-prompt deploy --prefect-file ./prefect.yaml --name research-${{ inputs.environment }}
c
Hi Simonas, do you have all of the dependencies you need installed in the GHA container? I believe if we get any kind of
ImportError
while trying to load the flow, we'd give that message.
s
maybe you have a reference of dependencies?
c
Oh I'm sorry, I mean the dependencies of your script, what Python packages it depends on
s
but itโ€™s just update the deployment, nothing more
c
Hmm, without more context I'm not sure I can see the issue
s
basically, I want to update the deployment image URI in CI, I was following this guide https://docs.prefect.io/latest/guides/ci-cd/ maybe you are right, I need to install all poetry dependencies, will try that