Hey all, I am running into a weird issue. We are t...
# prefect-community
Hey all, I am running into a weird issue. We are trying to use Prefect to build a docker container that contains a module that depends on Pandas. We have been successfully able to build it and log "hello" to the UI. However, when we begin to use the module, we are unable to rebuild the docker container because it is looking for Pandas. The error it gives is: ModuleNotFoundError: No module named 'pandas' how can we resolve this?
Hi @Matthew Blau, by "build a docker container" do you mean you are writing a flow that calls
docker build
? Or writing a flow that uses docker storage?
@Zanie It uses docker storage: this is the flow
Copy code
if __name__ == '__main__':
    logging = prefect.context.get("logger")
    with Flow(name="example",
                storage = Docker(dockerfile="/home/lookup/integration/Dockerfile",
                ignore_healthchecks= False,

    )) as flow:
        result = write_all_files()
    from prefect import config   
    flow.run_config = DockerRun(env={f"PREFECT__CONTEXT__SECRETS__{k}": v for k, v in config.context.secrets.items()})
Great, do you have a
RUN pip install pandas
or similar in that Dockerfile?
@Zanie we do not. Is that what we need to do?
That's a quick solution, although I think you're better off having a
in the module that you're putting into the docker container and using
pip install -e /path/to/your/module
or similar.
Another quick solution is to pass your requirements to the
storage class which has a kwarg for this
Copy code
python_dependencies (List[str], optional): list of pip installable dependencies for the image
@Zanie ahh; I have seen that. I get the modulenotfound error when I run from the cli
Copy code
python3 integration.py
It doesn't begin to build, it just gives the error before doing any building
Ah, so pandas is missing from your local machine and the flow is importing it so it fails?
@Zanie Correct. We have a dockerfile that pulls in the custom module and one of the files within the module uses Pandas. Built fine before we used the Module. Now that our code is using the module, it fails with a ModuleNotFoundError
Your file that's calling
needs to be able to run and python is going to complain if the module is not available. You can wrap the module imports in a
block and ignore the exception when you're just registering the flow or you can put the imports into their respective tasks so they are not attempted until flow runtime.
@Zanie The try/except block seems to work so that is the route I think we will take, thanks for the quick response!
Wonderful, you're welcome!