https://prefect.io logo
m

Matthew Blau

02/12/2021, 9:09 PM
Hey all, I am running into a weird issue. We are trying to use Prefect to build a docker container that contains a module that depends on Pandas. We have been successfully able to build it and log "hello" to the UI. However, when we begin to use the module, we are unable to rebuild the docker container because it is looking for Pandas. The error it gives is: ModuleNotFoundError: No module named 'pandas' how can we resolve this?
z

Zanie

02/12/2021, 9:12 PM
Hi @Matthew Blau, by "build a docker container" do you mean you are writing a flow that calls
docker build
? Or writing a flow that uses docker storage?
m

Matthew Blau

02/12/2021, 9:13 PM
@Zanie It uses docker storage: this is the flow
Copy code
if __name__ == '__main__':
    logging = prefect.context.get("logger")
    with Flow(name="example",
              #schedule=schedule,
                state_handlers=[slack_notifier],
                storage = Docker(dockerfile="/home/lookup/integration/Dockerfile",
                
                ignore_healthchecks= False,


    )) as flow:
        result = write_all_files()
    from prefect import config   
    flow.run_config = DockerRun(env={f"PREFECT__CONTEXT__SECRETS__{k}": v for k, v in config.context.secrets.items()})
    flow.register(project_name="test")
z

Zanie

02/12/2021, 9:13 PM
Great, do you have a
RUN pip install pandas
or similar in that Dockerfile?
m

Matthew Blau

02/12/2021, 9:14 PM
@Zanie we do not. Is that what we need to do?
z

Zanie

02/12/2021, 9:15 PM
That's a quick solution, although I think you're better off having a
setup.py
in the module that you're putting into the docker container and using
pip install -e /path/to/your/module
or similar.
Another quick solution is to pass your requirements to the
Docker
storage class which has a kwarg for this
Copy code
python_dependencies (List[str], optional): list of pip installable dependencies for the image
m

Matthew Blau

02/12/2021, 9:16 PM
@Zanie ahh; I have seen that. I get the modulenotfound error when I run from the cli
Copy code
python3 integration.py
It doesn't begin to build, it just gives the error before doing any building
z

Zanie

02/12/2021, 9:18 PM
Ah, so pandas is missing from your local machine and the flow is importing it so it fails?
m

Matthew Blau

02/12/2021, 9:20 PM
@Zanie Correct. We have a dockerfile that pulls in the custom module and one of the files within the module uses Pandas. Built fine before we used the Module. Now that our code is using the module, it fails with a ModuleNotFoundError
z

Zanie

02/12/2021, 9:21 PM
Your file that's calling
flow.register()
needs to be able to run and python is going to complain if the module is not available. You can wrap the module imports in a
try/except
block and ignore the exception when you're just registering the flow or you can put the imports into their respective tasks so they are not attempted until flow runtime.
m

Matthew Blau

02/12/2021, 9:37 PM
@Zanie The try/except block seems to work so that is the route I think we will take, thanks for the quick response!
z

Zanie

02/12/2021, 9:54 PM
Wonderful, you're welcome!
5 Views