https://prefect.io logo
a

Atul Anand

04/01/2022, 1:12 AM
Hi , Inside task I am using a third-party library or you can say other module python script and bound the volume to the agent. Still got an issue Module not found error. Is there any way to solve this or do tasks have some restriction that they can not call library out of scope? In simple, How can we use external modules?
c

Chu Lục Ninh

04/01/2022, 1:25 AM
Your agent run in container right? You can create customized container to run the agent. In your Dockerfile:
Copy code
FROM prefecthq/prefect
RUN python3 - m pip install somelib
And run that image instead of original prefect image
a

Atul Anand

04/01/2022, 1:48 AM
I think this is not the issue, Actually i have two files , let's say a.py , b.py. In a.py i have code related flow registration and one task. Inside the task i am using b.py methods(which is not a task.) Module not found error comes
If i comment the line it works perfectly
prefect.hello-flow | Unexpected error occured in FlowRunner: ModuleNotFoundError("No module named 'twitter'")
twitter is a python file which is bind to agent as well
c

Chu Lục Ninh

04/01/2022, 1:52 AM
Ah I see, what kind of agent are you using
a

Atul Anand

04/01/2022, 1:53 AM
It's local agent.
c

Chu Lục Ninh

04/01/2022, 1:54 AM
Local agent run in docker container?
a

Atul Anand

04/01/2022, 1:58 AM
Yes, Mounted the volume for code avaibility
c

Chu Lục Ninh

04/01/2022, 2:00 AM
Prefect serialize your flow into
~/.prefect/flows/
, did you register in host machine? What location did you mount the flow?
a

Atul Anand

04/01/2022, 2:00 AM
Then how can prefact work for a single python file?
Copy code
volumes:
    - /srv/docker/prefect/flows:/root/.prefect/flows
Copy code
- type: bind
  source: ./config.toml
  target: /root/.prefect/config.toml
  read_only: true
Copy code
# debug mode
debug = true

# base configuration directory (typically you won't change this!)
home_dir = "~/.prefect"

backend = "server"

[server]
host = "<http://172.17.0.1>"
port = "4200"
host_port = "4200"
endpoint = "${server.host}:${server.port}"
k

Kevin Kho

04/01/2022, 2:39 AM
You need this custom module installed inside the image, or your need to specify the working directory for Flow execution to point it a place where it can import this volume from (See LocalRun example at bottom with the working directory). If you don’t install it as a module or add that folder to the Python path, your Flow won’t be able to import it
a

Atul Anand

04/01/2022, 6:22 AM
Thanks for answering @Kevin Kho. Tried this but not working. Agent is running inside docker container and i have mounted(named volume) of code to agent. Tried debugging inside ssh of agen. Mount code is present and accesible for agent. But still i am getting module not found error. Do DaskDistributed(schedular and worker) also need code access or volume access?
k

Kevin Kho

04/01/2022, 6:35 AM
Ah yeah those guys likely need the module to be installed as well. At that point, you probably need it in your image. I have an article that might help with that here
a

Atul Anand

04/02/2022, 5:24 AM
@Kevin Kho Hi there! I have solved that issues. But i have few questions, Suppose we have a multiple microservices as Workflow(A,B,C) . Each worklow need 100 of dask workers. Now ,the question is dask worker needs all the dependency (libraries) before executing a code. So workers need all the dependency of A ,B,C if i want to execute flow of A,B,C. This makes a huge dependency of workers on services. I just want to make workers ,schedulars independent of code .
k

Kevin Kho

04/02/2022, 2:43 PM
Yes they do even if you don’t use Prefect and just use Dask. Your code is serialized with
cloudpickle
on the Client side and then sent to the worker (through the scheduler), and then it is unpickled there. In order to unpickle correctly and execute code, you need consistent package versions. I mean, you can have a new image for A, B,C is it helps but workers will never be independent of the code on your client
a

Atul Anand

04/02/2022, 3:55 PM
@Kevin Kho!! Thanks !! Got it! I can say that each services have its own set of workers and it is good practice to keep seperations.
What do you suggest @Kevin Kho?:)
k

Kevin Kho

04/02/2022, 5:22 PM
I think that sounds expensive to maintain? You can use ephemeral clusters instead defined with an image to spin them up. And then you can have different images for the different jobs
80 Views