Hello I have a workflow setup that works when I m running on Prefect Community #ask-community

Hello! I have a workflow setup that works when I’m...

Kim Pevey

08/18/2021, 3:21 PM

Hello! I have a workflow setup that works when I’m running on a local server. However, when I switch to production which is running on a KubernetesAgent, the flow fails at runtime with

Failed to load and execute Flow's environment: ValueError('Flow is not contained in this Storage')

I’m not sure how to debug this. What are the possible causes of this error?

Kevin Kho

08/18/2021, 3:25 PM

Hey @Kim Pevey, what Storage did you use? This error is saying that it’s looking for the Flow but can’t find it.

Kim Pevey

08/18/2021, 3:25 PM

Its on GCP

Kim Pevey

08/18/2021, 3:26 PM

I’m using the

GCS

Storage object

Kevin Kho

08/18/2021, 3:26 PM

Could you show me how you attached it to your Flow?

Kim Pevey

08/18/2021, 4:05 PM

ah. I was accidentally overriding my KubernetesRun() config parameters such that I didn’t have the proper credentials/image defined. I have now evaded that error but have encountered another. We are using the mambaforge docker image for execution. This image uses a config script as an entrypoint to make sure the appropropriate environment is activated when the container runs. Our logs indicate that this may have been overwritten. There is an error about tini not being able to find the prefect program. Should we be using DockerRun instead of KubernetesRun? Or is there a way to prevent the entrypoint being overwritten? Or should we just try to make sure that the environment is activated in a different way?

Kevin Kho

08/18/2021, 4:14 PM

I will double check but I believe we overwrite ENTRYPOINTs. The environment would have to be activated before then, but activating environments in a container is also tricky (at last for conda it is). We normally suggest just not using the environment because the container provides isolation anyway. Are you using

conda

pipenv

or something else?

Kim Pevey

08/18/2021, 4:31 PM

@Kevin Kho thank you so much for your help! I’m bringing @John Lee into the conversation to help articulate our setup.

John Lee

08/18/2021, 4:37 PM

Thanks Kim.

activating environments in a container is also tricky

I agree it is tricky 🙂 We are using conda. It's a large set of dependencies that we have to manage elsewhere with conda so it would be great if we could use the same here.

Kevin Kho

08/18/2021, 4:40 PM

This is the best article for that. I understand though. Some libraries only can be properly installed through conda.

John Lee

08/18/2021, 4:41 PM

Ah very nice. Thank you. We'll try that out and let you know how it goes.

John Lee

08/20/2021, 9:14 AM

Just checking back in on this. We are still getting the same error on the prefect runner that is launched:

Copy code

[FATAL tini (7)] exec prefect failed: No such file or directory

From that I inferred that the environment was not activated (because we have prefect in our environment. Based on the link you sent I modified the container. I extended it slightly based on this table to accommodate /bin/sh, both interactive and login, as I believe the bourne shell is being used due to this line. Here are the relevant lines of our dockerfile for this:

Copy code

...
ENV CONDA_ENV=myenv
SHELL ["/bin/bash", "--login", "-c"]

# Make RUN commands and the end containers use the new environment
# See <https://en.wikipedia.org/wiki/Unix_shell#Configuration_files>:
RUN export SETUP=". /opt/conda/etc/profile.d/conda.sh && conda activate ${CONDA_ENV}"; \
    for f in ~/.profile ~/.bashrc ~/.shrc /etc/profile /etc/bash.bashrc /etc/sh.shrc ; do \
    echo "$SETUP" >> $f; \
    done
ENV ENV="/etc/profile"
...

Our image should now be robust against the entrypoint or command being overwritten and /bin/sh being used but we are still getting the same error. Either there is a silly error somewhere (which I will try to hunt down over the coming hours) or env variables are completely overwritten in the container. Is there a way I can debug this locally to see changes made to the image before it is used or perhaps the container as it is launched? I don't have a handle on how I should go about debugging this sort of error.

John Lee

08/20/2021, 10:14 AM

I've reproduced the problem locally now with an image (here named cba):

Copy code

$docker run --rm -ti  cba /bin/sh -c prefect
/bin/sh: 1: prefect: not found

Or with tini:

Copy code

docker run --rm -ti  --init -e TINI_SUBREAPER="" cba /bin/sh -c prefect

It seems the conda activation is not working fully with /bin/sh:

Copy code

$ docker run --rm -ti   cba /bin/sh 
/bin/sh: 5: /opt/conda/envs/Colombia/etc/conda/activate.d/activate-binutils_linux-64.sh: Syntax error: "(" unexpected

The above doesn't seem to stop the "Colombia" environment being successfully activated when I set the shell as the entrypoint though...

Copy code

$ docker run --rm -ti --entrypoint /bin/sh  cba 
/bin/sh: 5: /opt/conda/envs/Colombia/etc/conda/activate.d/activate-binutils_linux-64.sh: Syntax error: "(" unexpected
(Colombia)

Perhaps "set -e" is being used somewhere that prevents the environment from being activated in our desired usecase.

John Lee

08/20/2021, 1:54 PM

My most recent inference is that prefect is being set as the entrypoint (with --init to the docker run command triggering the use of tini). I directly add the conda env to the PATH and that worked... I imagine this is not a robust solution though. I still can't spot where in the code this stuff is done; my above link about the interactive bourne shell is for the ECS run config but we are using KubernetesRun.

Kevin Kho

08/20/2021, 2:09 PM

Still going through this but what base image are you using?

John Lee

08/20/2021, 2:10 PM

condaforge/mambaforge:4.10.3-1

Kevin Kho

08/20/2021, 2:25 PM

I think I need to ask someone else on the team for this one. Will get back to you in a bit.

John Lee

08/20/2021, 2:25 PM

Sounds good. Thanks for the help.

Zanie

08/20/2021, 3:05 PM

Hey John, I'm not sure I entirely grasp what's going on here but I can definitely help you debug

Zanie

08/20/2021, 3:06 PM

If you use a

DockerRun

and a docker agent on your own machine, you should be able to disable the

auto_remove

and inspect the container after it has finished.

Zanie

08/20/2021, 3:06 PM

For example

Copy code

from prefect import Flow
from prefect.run_configs import DockerRun
from prefect.storage import Docker


with Flow("test-docker") as flow:
    pass


flow.storage = Docker()
flow.run_config = DockerRun(host_config=dict(auto_remove=False))

Zanie

08/20/2021, 3:09 PM

Copy code

prefect register -p flow.py --project test
prefect agent docker start --label run-here
prefect run --name test-docker --label run-here --run-name test-docker-run
docker inspect test-docker-run

Zanie

08/20/2021, 3:16 PM

There is also a

--disable-job-deletion

flag on

prefect agent kubernetes start

so you can inspect the job spec

Zanie

08/20/2021, 3:16 PM

If you'd like to see what's going on with the Kubernetes run specifically (which may behave slightly different than a local docker agent)

John Lee

08/20/2021, 3:44 PM

Excellent. Thanks for this. I'll take these debugging approaches for a spin, and they will make it easier to debug these issues in future. For now we will use the PATH hack but let me know if there is anyone internally who has an preferred way of making sure Prefect is available from a conda environment.

6 Views

Open in Slack

Previous Next