Hello! I have a workflow setup that works when I’m...
# ask-community
k
Hello! I have a workflow setup that works when I’m running on a local server. However, when I switch to production which is running on a KubernetesAgent, the flow fails at runtime with
Failed to load and execute Flow's environment: ValueError('Flow is not contained in this Storage')
I’m not sure how to debug this. What are the possible causes of this error?
k
Hey @Kim Pevey, what Storage did you use? This error is saying that it’s looking for the Flow but can’t find it.
k
Its on GCP
I’m using the
GCS
Storage object
k
Could you show me how you attached it to your Flow?
k
ah. I was accidentally overriding my KubernetesRun() config parameters such that I didn’t have the proper credentials/image defined.  I have now evaded that error but have encountered another. We are using the mambaforge docker image for execution. This image uses a config script as an entrypoint to make sure the appropropriate environment is activated when the container runs. Our logs indicate that this may have been overwritten. There is an error about tini not being able to find the prefect program. Should we be using DockerRun instead of KubernetesRun? Or is there a way to prevent the entrypoint being overwritten? Or should we just try to make sure that the environment is activated in a different way?
k
I will double check but I believe we overwrite ENTRYPOINTs. The environment would have to be activated before then, but activating environments in a container is also tricky (at last for conda it is). We normally suggest just not using the environment because the container provides isolation anyway. Are you using
conda
or
pipenv
or something else?
k
@Kevin Kho thank you so much for your help! I’m bringing @John Lee into the conversation to help articulate our setup.
j
Thanks Kim.
activating environments in a container is also tricky
I agree it is tricky 🙂 We are using conda. It's a large set of dependencies that we have to manage elsewhere with conda so it would be great if we could use the same here.
k
This is the best article for that. I understand though. Some libraries only can be properly installed through conda.
j
Ah very nice. Thank you. We'll try that out and let you know how it goes.
Just checking back in on this. We are still getting the same error on the prefect runner that is launched:
Copy code
[FATAL tini (7)] exec prefect failed: No such file or directory
From that I inferred that the environment was not activated (because we have prefect in our environment. Based on the link you sent I modified the container. I extended it slightly based on this table to accommodate /bin/sh, both interactive and login, as I believe the bourne shell is being used due to this line. Here are the relevant lines of our dockerfile for this:
Copy code
...
ENV CONDA_ENV=myenv
SHELL ["/bin/bash", "--login", "-c"]

# Make RUN commands and the end containers use the new environment
# See <https://en.wikipedia.org/wiki/Unix_shell#Configuration_files>:
RUN export SETUP=". /opt/conda/etc/profile.d/conda.sh && conda activate ${CONDA_ENV}"; \
    for f in ~/.profile ~/.bashrc ~/.shrc /etc/profile /etc/bash.bashrc /etc/sh.shrc ; do \
    echo "$SETUP" >> $f; \
    done
ENV ENV="/etc/profile"
...
Our image should now be robust against the entrypoint or command being overwritten and /bin/sh being used but we are still getting the same error. Either there is a silly error somewhere (which I will try to hunt down over the coming hours) or env variables are completely overwritten in the container. Is there a way I can debug this locally to see changes made to the image before it is used or perhaps the container as it is launched? I don't have a handle on how I should go about debugging this sort of error.
I've reproduced the problem locally now with an image (here named cba):
Copy code
$docker run --rm -ti  cba /bin/sh -c prefect
/bin/sh: 1: prefect: not found
Or with tini:
Copy code
docker run --rm -ti  --init -e TINI_SUBREAPER="" cba /bin/sh -c prefect
It seems the conda activation is not working fully with /bin/sh:
Copy code
$ docker run --rm -ti   cba /bin/sh 
/bin/sh: 5: /opt/conda/envs/Colombia/etc/conda/activate.d/activate-binutils_linux-64.sh: Syntax error: "(" unexpected
The above doesn't seem to stop the "Colombia" environment being successfully activated when I set the shell as the entrypoint though...
Copy code
$ docker run --rm -ti --entrypoint /bin/sh  cba 
/bin/sh: 5: /opt/conda/envs/Colombia/etc/conda/activate.d/activate-binutils_linux-64.sh: Syntax error: "(" unexpected
(Colombia)
Perhaps "set -e" is being used somewhere that prevents the environment from being activated in our desired usecase.
My most recent inference is that prefect is being set as the entrypoint (with --init to the docker run command triggering the use of tini). I directly add the conda env to the PATH and that worked... I imagine this is not a robust solution though. I still can't spot where in the code this stuff is done; my above link about the interactive bourne shell is for the ECS run config but we are using KubernetesRun.
k
Still going through this but what base image are you using?
j
condaforge/mambaforge:4.10.3-1
k
I think I need to ask someone else on the team for this one. Will get back to you in a bit.
j
Sounds good. Thanks for the help.
z
Hey John, I'm not sure I entirely grasp what's going on here but I can definitely help you debug
If you use a
DockerRun
and a docker agent on your own machine, you should be able to disable the
auto_remove
and inspect the container after it has finished.
For example
Copy code
from prefect import Flow
from prefect.run_configs import DockerRun
from prefect.storage import Docker


with Flow("test-docker") as flow:
    pass


flow.storage = Docker()
flow.run_config = DockerRun(host_config=dict(auto_remove=False))
Copy code
prefect register -p flow.py --project test
prefect agent docker start --label run-here
prefect run --name test-docker --label run-here --run-name test-docker-run
docker inspect test-docker-run
There is also a
--disable-job-deletion
flag on
prefect agent kubernetes start
so you can inspect the job spec
If you'd like to see what's going on with the Kubernetes run specifically (which may behave slightly different than a local docker agent)
j
Excellent. Thanks for this. I'll take these debugging approaches for a spin, and they will make it easier to debug these issues in future. For now we will use the PATH hack but let me know if there is anyone internally who has an preferred way of making sure Prefect is available from a conda environment.