Hi, how do I make prefect to send all the required...
# ask-community
m
Hi, how do I make prefect to send all the required side libraries to the underlying Dask scheduler? I.e. running the 02_etl_flow.py example from the tutorial when I send the flow to a remote dask scheduler I get the following error:
Copy code
ModuleNotFoundError: No module named 'aircraftlib'
[2021-05-13 09:27:29+0200] ERROR - prefect.etl | Unexpected error occured in FlowRunner: ModuleNotFoundError("No module named 'aircraftlib'")
aircraftlib is a module from the tutorial imported by the flow. it works when I run it all locally, it does not work if i send it to a remote dask scheduler. I could place it all in a single file "flow.py" but is there an elegant way to do this? thanks.
k
Just confirming that you have no problem with worker packages? This is for the scheduler side?
m
Hi, I have a module on the local machine that the workflow uses and I am using a dask executor. The dask executor does not "receive" the side module, only the workflow definition. it is the tutorial ETL https://docs.prefect.io/core/tutorial/03-parameterized-flow.html and the workflow python file imports:
Copy code
import aircraftlib as aclib
If i try to run this flow on a dask executor, it fails because aircraftlib is not installed.
i am just following the tutorial step 2 and trying to run it on a dask executor. Tutorial I m mentioning is here: https://github.com/PrefectHQ/prefect/blob/master/examples/tutorial/02_etl_flow.py
k
I know what you mean. I’m looking for a way to get it to the scheduler
m
perhaps i am misunderstanding the concept of prefect. maybe there should be "prefect agent" that connect to the dask executor for it to work seamlessly? Is there a documentation on how the prefect dashboard "agents" can connect to the dask executor?
k
This is more a Dask setup question than the Prefect setup. Dask needs packages installed on both the scheduler and workers (and they need to have matching versions for the most part). The answer to the question is that the Dask and Prefect images both take in the
EXTRA_PIP_PACKAGES
env variable we tried earlier.
You need to add the
aircraftlib
to both the scheduler and workers
In the values.yaml
I’m not 100% sure if you need it on the scheduler, but you can try just adding it to the workers first
m
I mean this is a tutorial code and the "module" is just a side python file. I might as well just put everything inside a single python file.
k
You mean copy and paste the contents of
aircaftlib
into the Python file?
m
exactly
k
Yeah that should work. Or adding it to the container like we talked about previously so it gets installed through the helm chart.
m
I understand this is a dask issue. How do you do it in a complex prefect flows that do import outside dependencies? you preinstall them on all workers?
k
Through the Docker image normally. This varies from user to user because some users have a cluster that’s already up that they just connect to while other users spin up the Dask cluster on demand. You can make a container with all the dependencies and then use pass it to DaskExecutor
👍 1
m
aha so you can make a container dynamically within a code and pass it as an argument to DaskExecutor?
k
DaskExecutor can take in a
Callable
and some callables take in images as argument to spin up the cluster with that image
Example with Coiled:
Copy code
from prefect.engine.executors import DaskExecutor
import coiled

coiled_executor = DaskExecutor(cluster_class=coiled.Cluster, 
                               cluster_kwargs=dict(name="prefect-demo", 
                                                   configuration="my-acct/demo-cluster-config"))
flow_state = flow.run(executor=coiled_executor)