I find it a bit clunky that we have to either write all our Prefect Community #ask-community

I find it a bit clunky that we have to either writ...

Ryan Sattler

10/12/2021, 5:35 AM

I find it a bit clunky that we have to either write all our code in one giant file, or rebuild + upload a docker image every time any of the other files change. Is Orion going to have a cleaner way to structure flows in multiple files?

👀 1

upvote 2

Chris L.

10/12/2021, 10:53 AM

If you are using KubernetesAgent with a custom job template, you can add a K8 command (ie entrypoint) for the job container to pip install your Git repo. With this setup, every time a new flow run is executed, K8Agent spawns a new job which first pip installs the latest user-defined code directly from Git. This decouples user code (which changes often and is pulled upon every new flow run) to external dependencies (which you save into the job’s image and does not have to be rebuilt after every user file change)

🤔 1

Chris L.

10/12/2021, 10:55 AM

Not sure about Orion though! Would like a breakdown for that

Evan Curtin

10/12/2021, 3:58 PM

https://skaffold.dev/ makes rebuilding and updating a lot less painful

Anna Geller

10/13/2021, 9:13 AM

@Ryan Sattler It could be just a matter of packaging. I assume you use DockerStorage, correct? You could have a look at using Script based storage with either: • one of Git storage classes (GitHub, Bitbucket, Gitlab, Git) • one of cloud storage classes (S3, GCS, Azure) Then, you could pass your container image (that contains all dependencies needed by a flow) to your run configuration (e.g.

KubernetesRun

) and Prefect will grab the flow code either from Git or from cloud storage during a flow run. This way, you don’t have to rebuild your Docker image when the flow code changes. Does it make sense for your use case? When it comes to Orion, you’re right that it will likely be easier, since Orion decouples flow from its Deployment and allows attaching multiple deployments to a flow.

Zanie

10/13/2021, 3:06 PM

We'll also be revisiting user/local dependencies in Orion to streamline this experience. The flow data attached to a deployment is just a blob--it can be a pickle of the flow or an encoded location to pull the flow from. In the future, it could be a tarball of the flow and its dependencies, a description of a virtual environment, a deep-cloudpickle that contains all of the dependencies inline, etc. The design is better suited to handling this use-case, we just need to determine the best way to expose it.

upvote 1

Ryan Sattler

10/13/2021, 11:50 PM

Thanks Anna, I’m currently using S3 storage so that part is ok - the problem is when local python files other than the main flow script itself change. Michael - something along those lines sounds good.

👍 1

Chris L.

10/14/2021, 1:40 AM

job_template.yaml

custom job template for K8 agent's jobs

Copy code

apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      containers:
      - name: flow
        command: ["tini", "-g", "--", "/usr/bin/prepare.sh"]
        env:
          - name: IS_PIP_PACKAGE
            valueFrom:
              configMapKeyRef:
                name: flow-env-vars
                key: is-pip-package
          - name: REPO_NAME
            valueFrom:
              configMapKeyRef:
                name: flow-env-vars
                key: repo-name
          - name: GIT_REF
            valueFrom:
              configMapKeyRef:
                name: flow-env-vars
                key: git-ref
          - name: GITHUB_ACCESS_TOKEN
            valueFrom:
              secretKeyRef:
                name: github-auth
                key: password

With

prepare.sh

Copy code

#!/bin/bash
set -x

if [ "$IS_PIP_PACKAGE" ]; then
    echo "IS_PIP_PACKAGE environment variable found."
    "$CONDA_DIR/envs/$conda_env/bin/pip" install "git+https://$GITHUB_ACCESS_TOKEN@github.com/$REPO_NAME.git@$GIT_REF"
fi

# Run extra commands
exec "$@"

Chris L.

10/14/2021, 1:42 AM

Hello @Ryan Sattler , if you are using Kubernetes you can consider using the setup above. Package up all your Python and non-python code with

setuptools

, then change set $GIT_REF in the K8 namespace's configmap to "main" or the PR ref that the developer is working on. And use absolute imports for everything (in development you can install your user-defined package with

pip install -e .

)

Chris L.

10/14/2021, 1:43 AM

I believe this approach was also mentioned here https://prefect-community.slack.com/archives/CL09KU1K7/p1634098863273900?thread_ts=1634089912.272600&cid=CL09KU1K7

Chris L.

10/14/2021, 1:54 AM

This seems to work really well alongside a CICD pipeline with automated K8 agent deployments. A new agent is deployed to a separate namespace every time a new PR is created (with $GIT_REF data in the configmap changed to "feat/pr-16-branch" for example). I'm using Helm to manage these configmap deployments. Moreover, your "prod" agent with "GIT_REF=main" will always pull the latest changes packaged with

setuptools

every time it spins up a new job to execute a flow run. These changes can be Python modules used in your flow or even non-Python files (https://setuptools.pypa.io/en/latest/userguide/datafiles.html) Hope this clarifies my previous reply!

Ryan Sattler

10/14/2021, 5:31 AM

Thanks - for now we are just installing our code (which is configured as a python package with a setup.py) at runtime by including the internal github url (in pip format) in the

EXTRA_PIP_PACKAGES

env var in KubernetesRun which seems to work.

Ryan Sattler

10/14/2021, 5:32 AM

This does require the code to be committed to git but at least avoids rebuilding the docker image

Sean Talia

10/14/2021, 2:15 PM

^ I was considering exactly this @Ryan Sattler! do you require that the code be merged to your default branch when you do this or can you install the package from a particular branch in the source repo w/ a corresponding github url?

Zanie

10/14/2021, 4:01 PM

The $GIT_REF can be a commit or branch

Ryan Sattler

10/14/2021, 10:30 PM

@Sean Talia You can install a branch, see the first answer here for the URL syntax: https://stackoverflow.com/questions/20101834/pip-install-from-git-repo-branch

13 Views

Open in Slack

Previous Next