Hello, I'm currently trying to register a flow to ...
# ask-community
j
Hello, I'm currently trying to register a flow to amazon ECR using the docker storage. I want to include an existing dockerfile to handle the different modules the flow needs, but when I add the dockerfile parameter when setting the flow's storage it causes the process to get stuck on
prefect.Docker:Building the flow's Docker storage...
after creating a new temp directory with the dockerfile and
healthcheck.py
k
That’s weird let me look at the source
Looks like there are no more logs after that line. What happens when you build the image yourself with the Docker CLI? Does it work?
j
If I build it myself then the portion of the existing docker file works fine, however the line
RUN pip install pip --upgrade
that is added to the end of it by prefect throws the error
Could not install packages due to an OSError: [Errno 13] Permission denied: '/.venv/bin/pip'
. Could be related to some conflect between pip and pipenv, which I've been using to handle dependencies?
k
Oh I guess so. What is your base image? Might be easier to use the prefect base image?
j
Here's the dockerfile generated by prefect. Everything before the
ENV PREFECT__USER_CONFIG_PATH='/opt/prefect/config.toml'
was a part of the dockerfile I've been using to build the project before I tried integrating it with prefect's docker storage. I'd really prefer to keep the compatibility with pipenv if possible
Is there anyway I can configure the part of the dockerfile auto generated by prefect?
k
Probably not because the agent really goes through those lines of code. You could just build your image though and make Prefect not build if it helps? Just use your own image with the Flow
j
If I try to use my existing image as the the the base image it still throws
Could not install packages due to an OSError: [Errno 13] Permission denied: '/.venv/bin/pip'
He's the portion of the image building where it fails
Step 3/8 : RUN pip install pip --upgrade
---> Running in 19aa18cb4c49
Requirement already satisfied: pip in /.venv/lib/python3.9/site-packages (22.0.4)
Collecting pip
Downloading pip-22.1.1-py3-none-any.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 8.3 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.0.4
Uninstalling pip-22.0.4:
k
Not as the base image. More like you build it already and then just point to it in the
DockerRun
or you do something like this with
Copy code
flow.register(..., build=False)
and Prefect won’t try to build an image.
j
So are you saying that instead of trying to store the flow I should just pull from the image using DockerRun and then register the flow without building it?
k
Yes in the example there, the Docker storage is pointing to a path inside the Flow where it already exists. Prefect is not building any image there.
j
When I try using a docker agent to run it after following those steps then I get the following error
marshmallow.exceptions.ValidationError: {'_schema': 'Invalid data type: None'}
Seems like the error starts at
.venv\lib\site-packages\prefect\agent\agent.py", line 388, in _deploy_flow_run
deployment_info = self.deploy_flow(flow_run
)
k
Could I see the current code?
j
flow.run_config = DockerRun(image="{image link}")
client = prefect.Client(api_key=os.environ['api key}])
id = client.register(flow=flow, project_name="{project name}", build=False)
launch_task_thread = threading.Thread(target=launch_task, args=(id,))
launch_task_thread.start()
DockerAgent().start()
The launch task thread waits 20 seconds before calling
create_flow_run
with the newly registered id, allowing the flow to run after the Docker Agent is up and running
k
Could you show me the current storage?
j
Like the image storage? It's hosted on AWS
k
The Prefect storage
Copy code
flow.storage = ...
just redact any sensitive info. or did you not define it anymore?
j
Yeah I thought you were saying that defining the storage isn't necessary when you're running it from a docker image like this
If I use local storage it throws the error
ValueError('Flow is not contained in this Storage')
k
It is necessary like this just to point to the Flow file inside the container, otherwise Prefect won’t know where to grab it (I think that causes the marshmallow error, but not sure). This won’t build, but it’s still stored so Prefect knows where to look for the Flow. Does that make sense?
j
Just set it up and now when I run it it seems to throw either
Failed to load and execute flow run: FileNotFoundError(2, 'No such file or directory')
or a modulenotfound error depending on how I configure the path and prefect_directory parameters in docker to point to the file where the flow is written
In the image being used the flow is stored in
/home/appuser/src/app.py
k
Ah ok we’re nearly there. I think it’s just a configuration piece. Can I see the current Docker storage definition? Just redact sensitive info
j
root:/home/appuser# ls
resources  src
root:/home/appuser# cd src
root:/home/appuser/src# ls
__init__.py  app.py  transform  transformation_tasks.py  util.py  validation
There's nothing stored in the
opt
directory which prefect says it looks for by default
k
I mean the Storage
Copy code
Docker(...)
looking for the path
j
docker_storage = Docker(
image_name="ingestion-pipeline",
image_tag="prefect-test",
registry_url="{url}",
stored_as_script=True,
prefect_directory = "/home",
path=f"/home/appuser/src/app.py",
)
This throws
Failed to load and execute flow run: ModuleNotFoundError("No module named 'src'")
k
I think I know what was is. You don’t need
prefect_directory
but the Flow itself in
app.py
actually started and has an import that is failing. Maybe you have something like
Copy code
from src.transform import ...
but it can’t find src right? I have calls for the next hour, but basically this has to be installed as a Python module so it can be used in the container. Have you done that before?
j
Oh yeah I understand what you're saying, I've had problems before with it python code not being able to detect relative imports like that. I fixed it previously by configuring the project path to include those additional modules, but it seems like you want me to take a different approach?
k
Yeah you can manipulate it the Python path or you can create a
setup.py
file to make the module pip installable. I have a walkthrough here
j
Hmm so I got it configured and set up so it can handle the imports, but for some reason, it reverts back to the
Failed to load and execute flow run: ModuleNotFoundError("No module named 'src'")
if I try to pass in an environment variable dictionary into the run config. Confused why this would be the case
k
That….doesn’t sound related. So you got a working run but changed the RunConfig and it broke? Could you try removing the env var and seeing if it works?
j
So if I define my RunConfig as
run_config=DockerRun(
image="{image}",
labels=["docker"],
),
it works up until it throws the key error for the missing env variable
but if I define it as
run_config=DockerRun(
image="{image}",
env=env_dict,
labels=["docker"],
)
the it throws the module not found error
This happens both when I try to override the the run config in the
create_flow_run
parameters and when I change how it is originally defined
k
Is there anything in your env_dict that can affect imports?
j
oh yeah you're right. My apologies I should have caught that
k
Oh what is it I’m curious what you’re doing?
j
I had pythonpath buried in the env variable dict