Anyone using editable / source dependencies in you...
# ask-community
h
Anyone using editable / source dependencies in your flows?
k
Kind of. You mean like a module that is always changing right? I have seen someone git clone and pip install using the ENTRYPOINT of their Dockerfile
h
it works from the dockerfile, but there's a check when building the 'docker' storage
Copy code
Step 26/26 : RUN python /opt/prefect/healthcheck.py '["/opt/prefect/flows/dbt-data-pipelines.prefect", "/opt/prefect/flows/exchange-rates.prefect", "/opt/prefect/flows/nightly-mmm.prefect"]' '(3, 8)'
 ---> Running in 9613f643e8e1
/opt/prefect/healthcheck.py:152: UserWarning: Flow uses module which is not importable. Refer to documentation on how to import custom modules <https://docs.prefect.io/api/latest/storage.html#docker>
  flows = cloudpickle_deserialization_check(flow_file_paths)
Traceback (most recent call last):
  File "/opt/prefect/healthcheck.py", line 152, in <module>
    flows = cloudpickle_deserialization_check(flow_file_paths)
  File "/opt/prefect/healthcheck.py", line 44, in cloudpickle_deserialization_check
    flows.append(cloudpickle.loads(flow_bytes))
ModuleNotFoundError: No module named 'mmm.model'
k
Ah you can defer the import by putting it inside a task
h
You mean all imports?
for that module?
what of the
prefect_utils
I use to construct tasks?
it's also a source module
Docs say
Otherwise the modules can be set independently when using a custom base image prior to the build here.
I'd like that
I already have a custom dockerfile
Do I have to ignore all health checks if I use editable modules?
k
Ah no. If you put the editable library inside the Dockerfile it should work. Do you have the library installed locally?
Will try this in a bit
h
root: pipfile,
Copy code
cd infer/prefect_utils && pip install -e .
cd ../..
pipenv install
dockerfile:
Copy code
FROM prefecthq/prefect:0.15.4-python3.8

RUN pip install --upgrade pip setuptools wheel twine \
    && pip install pipenv \
    && apt-get update \
    && apt-get install -y --no-install-recommends curl gcc python3-dev libssl-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /opt/prefect

COPY Pipfile* packages.yml profiles.yml .user.yml .python-version dbt_project.yml postinstall.py ./
COPY infer ./infer
COPY dbt ./dbt

# <https://docs.pipenv.org/en/latest/advanced.html#managing-system-dependencies>
# or else /opt/prefect/healthcheck.py:152: UserWarning: Flow uses module which is not importable.
# Refer to documentation on how to import custom modules
# <https://docs.prefect.io/api/latest/storage.html#docker>
RUN PIPENV_VENV_IN_PROJECT=1 pipenv install --deploy
ENV PATH="/opt/prefect/.venv/bin:$PATH"

RUN python postinstall.py

COPY flows /opt/prefect/flows
k
Wait, if you want to make your own Docker image with the dependencies, you can do that and just run the
docker build
yourself. And then on the Flow side, just use
DockerRun
with that image you build. Do you want the flow inside that image or outside like S3 or Github?
h
Inside the image. What's DockerRun?
k
The RunConfiguration for the Flow so that when the flow starts, it will load an image and run the flow on top of the image.
h
But I still do
Copy code
flows = sorted(Path("flows").glob("*.py"))


# Add flows
for file in flows:
    flow = extract_flow_from_file(file_path=file)
    <http://logger.info|logger.info>("Extracted flow from file before build")

    docker.add_flow(flow)
?
Guess not
Only flow.register?
But I'm running on the KubernetesAgent?
k
No. You can pair
Local Storage
with
DockerRun
and provide a path to the file.
h
But I still need to do
flow = extract_flow_from_file(file_path=file)
otherwise I can't assign the storage to the flow, right?
and it's the pickling that causes the errors
Do you have an example somewhere?
k
I think you can pair
Local Storage
with
KubernetesRun
and if the image is already created and the flow is already inside, you just point your storage like `Local(path=...,, stored_as_script=True)`and then the flow will be relative to the image.
I do for DockerRun and Local storage. One sec
h
But IMO KubernetesRun is not configured here, it's configured using the cloud k8s agent
You mean you specify the image on the agent side instead of the RunConfig?
h
No, but I specify the JobTemplate and such
So when I register the flow, do I have to register it by declaring its runtime inside of it?
It feels so coupled in a way that doesn't even correspond to how I run it locally (because I can run it both locally and with k8s)
k
Yeah you're right. Will look deeper at this tom and see what causes the failing healthcheck. That's when building the Docker storage right?
h
Yes
I could disable them and deploy though
But I think my current version of the script adds the flow files twice, once in pythonic form and once in pickled + deps
Maybe not deps, since it's not crashing, they may be outside
It's crashing too yes
And I'm building all my flows into one image, to keep the 1.4 GiB from having to be re-fetched for every flow
k
Gotcha ok will try to replicate that failed healthcheck with docker storage tom
h
But really, what is DockerRun? I'm not running anything on Docker; it's running on GKE with containerd / runc on COS
You could have a look here if you would like to https://prod.liveshare.vsengsaas.visualstudio.com/join?BC675319BDD2D2337BF1DCAC0A95FBD4E759 — to save yourself from having to repro it
k
DockerRun pairs with the Docker agent and then you specify an image so that the Flow will be loaded and run in that container. For example you can load a flow from S3 and then run it on an image you specify. So you can pack all the dependencies if they don't change frequently and then just run flows on top of it.
The default RunConfig is a UniversalRun, but for Kubernetes, there is the KubernetesRun where you can also specify the image.
h
But can the k8s agent just pull an image then?
I don't want to declare any job templates next to the flows because it's not their concern
k
You can. You're doing it that way currently right?
h
and if the k8s / cloud agent pulls an image, is it still dockerrun?
yes, but I'm using UniversalRun
With DockerStorage
k
It is not DockerRun. Ignore the DockerRun if you are on Kubernetes. The Kubernetes agent combines it's job template with the KubernetesRun RunConfig when deploying a flow so you can indeed just set it on the agent side
That link you gave doesn't work for me btw on browser. Do you know if I need to use VScode to use it?
h
Yes you need VSCode
It's a code sharing session
I can show you here in Slack otherwise
k
Sure
h
Hmm, no huddle / video buttons?
k
Oh I thought you meant paste the code. I'm about to sleep 😅
h
😄
Get some sleep, I'll figure something out 😉
k
Lol. Leave a message though. I'll look into it tom
🙌 1
h
You've helped me enough 🙂
Like now, again, the lack of separation bites:
Copy code
run_config=DockerRun(image="europe-docker.pkg.dev/ex/cd/data-pipelines:latest"),
I never want to run the latest container; I always want to set a very specific version
and the flow should be possible to declare stand-alone without it knowing its tag