Hi all, I am trying to run a prefect flow on a das...
# ask-community
m
Hi all, I am trying to run a prefect flow on a dask executor. I am getting the following error
[2021-05-11 19:11:14+0200] ERROR - prefect.FlowRunner | Unexpected error: ModuleNotFoundError("No module named 'prefect'")
it seems that the worker does not have the prefect installed. Dask is running in a kubernetes cluster so is there a smart way on how to install prefect module on all the dask workers? M
k
Hey @Matej, can you show me how you set up the DaskExecutor/
m
I use the following from the tutorials:
executor = DaskExecutor(address="<tcp://dask-scheduler-hostname:8786>")
flow.run(executor=executor)
dask-scheduler-hostname is where the dask scheduler runs in a kubernetes cluster
k
I think if you do it like this, that cluster already exists and it needs to be installed on that cluster whereas the link I sent spins up the cluster and installs the image on it.
m
I've used helm install to set up the dask cluster. In the next step i ll look into changing the cluster dynamically. Any simple way on how to pip install prefect on every dask worker ?
k
I think this is what you need
You can set
EXTRA_PIP_PACKAGES
and the Dask cluster will install them: https://github.com/dask/dask-docker#environment-variables
You can also use the base Prefect image. It has Dask.
m
Hi Kevin, I've set the EXTRA_CONDA_PACKAGES on the worker in the yaml file
Copy code
- name: EXTRA_CONDA_PACKAGES
  value: prefect -c conda-forge
and I still get the same error ModuleNotFound prefect
[2021-05-12 18:57:11+0200] INFO - prefect.DaskExecutor | Stopping executor, waiting for 1 active tasks to complete
[2021-05-12 18:57:11+0200] ERROR - prefect.FlowRunner | Unexpected error: ModuleNotFoundError("No module named 'prefect'")
Traceback (most recent call last):
File "/home/m/repo/prefect/src/prefect/engine/runner.py", line 48, in inner
new_state = method(self, state, *args, **kwargs)
File "/home/m/repo/prefect/src/prefect/engine/flow_runner.py", line 643, in get_flow_run_state
final_states = executor.wait(
File "/home/m/repo/prefect/src/prefect/executors/dask.py", line 414, in wait
return self.client.gather(futures)
File "/home/m/miniconda3/envs/prefect/lib/python3.8/site-packages/distributed/client.py", line 1975, in gather
return self.sync(
File "/home/m/miniconda3/envs/prefect/lib/python3.8/site-packages/distributed/client.py", line 843, in sync
return sync(
File "/home/m/miniconda3/envs/prefect/lib/python3.8/site-packages/distributed/utils.py", line 353, in sync
raise exc.with_traceback(tb)
File "/home/m/miniconda3/envs/prefect/lib/python3.8/site-packages/distributed/utils.py", line 336, in f
result[0] = yield future
File "/home/m/miniconda3/envs/prefect/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/home/m/miniconda3/envs/prefect/lib/python3.8/site-packages/distributed/client.py", line 1840, in _gather
raise exception.with_traceback(traceback)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
ModuleNotFoundError: No module named 'prefect'
[2021-05-12 18:57:11+0200] ERROR - <http://prefect.My|prefect.My> First Flow | Unexpected error occured in FlowRunner: ModuleNotFoundError("No module named 'prefect'")
What am i doing wrong? It is a simple hello world example.
k
That looks right. Can I see your config file?
m
you mean values.yaml?
k
I thought the link said spin it up first with the values.yaml, and then create the new config.yaml and then use the
upgrade
like
helm upgrade bald-eel dask/dask -f config.yaml
You tried to add it directly to the values.yaml?
m
worker.yaml
yes i upgraded with new values.yaml
helm upgrade bald-eel dask/dask -f config.yaml
this is the worker part of the config.yaml
when i ssh into the worker pod, the module is not installed
I think the problem is that I have to run dask rootless hence the setting of securityContext for the worker:
Copy code
securityContext:
  runAsUser: 1000
  runAsGroup: 1000
Hence the user 1000 is unable to run conda install because of not root privilege
NotWritableError: The current user does not have write permissions to a required path.
path: /opt/conda/pkgs/urls.txt
uid: 1000
gid: 1000
If you feel that permissions on this path are set incorrectly, you can manually
change them by executing
$ sudo chown 1000:1000 /opt/conda/pkgs/urls.txt
k
Ok i think this might be easier if you use the prefect base image for the workers because that contains dask
m
Copy code
prefecthq/prefect:latest
for the worker image?
k
What Python version are you on?
For 3.7 yes that looks right
m
3.8 is my home python
*local
basically how to change the worker.yaml for it to be "prefect base image" ?
k
I think it’s here: https://github.com/dask/helm-chart/blob/main/dask/values.yaml#L64-L71 . Try prefecthq/prefect:latest-python3.8
i guess use the same image for the scheduler as well
m
Any other way to install prefect on dask workers if the worker is running as NonRoot ?
k
What error did you get in this case?
m
the pods crash completely both the scheduler and the workers
k
Any logs or errors?
m
Traceback (most recent call last):
File "/usr/local/bin/dask-scheduler", line 5, in <module>
from distributed.cli.dask_scheduler import go
File "/usr/local/lib/python3.8/site-packages/distributed/cli/dask_scheduler.py", line 120, in <module>
def main(
File "/usr/local/lib/python3.8/site-packages/click/decorators.py", line 247, in decorator
_param_memo(f, OptionClass(param_decls, **option_attrs))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 2467, in __init__
super().__init__(param_decls, type=type, multiple=multiple, **attrs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 2106, in __init__
raise ValueError(
ValueError: 'default' must be a list when 'multiple' is true.
shceduler pod logs
(base) m@desktop:~/repo/kubernetes/dask$ k logs dask-worker-7b55c8bb7-4fsft
Traceback (most recent call last):
File "/usr/local/bin/dask-worker", line 8, in <module>
sys.exit(go())
File "/usr/local/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 461, in go
check_python_3()
File "/usr/local/lib/python3.8/site-packages/distributed/cli/utils.py", line 32, in check_python_3
_unicodefun._verify_python3_env()
AttributeError: module 'click._unicodefun' has no attribute '_verify_python3_env'
worker pods
thanks for your patience
k
Sure! Can you try this image instead:
prefecthq/prefect:0.14.17-python3.8
. Just dropping the version down
m
in the tag: correct?
k
yes that’s right
0.14.17-python3.8 is tha tag
m
it is running. I ll test the hello world.
Perfect! now it works! thanks so much. What do you suggest I can do to add additional packages such as numpy or pandas to the workers in the case that I have NonRoot pod? Thanks.
k
Nice! If this approach doesn’t work, I guess the approach is to make a new image and host it somewhere like Dockerhub to pull down.
m
aha so I prepare my own image with all the required packages and then just set it up in the same way.
because now even my dashboard does not work
Dask needs bokeh >= 0.13.0 for the dashboard.
k
Yes it seems like that would work
m
perhaps maybe i can set up some worker preload script that would set up a non root conda where a non root user can install?
k
Maybe that is worth a shot
m
ok ill try. thanks for help
Any idea why the newest prefect base image fails? (I wanted to try artifact store) 0.14.19-python3.7 fails on scheduler and worker with the following exception
nvidia 34041856 87 nvidia_modeset,nvidia_uvm, Live 0xffffffffc0ce0000 (POE)
Traceback (most recent call last):
File "/usr/local/bin/dask-scheduler", line 5, in <module>
from distributed.cli.dask_scheduler import go
File "/usr/local/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 119, in <module>
@click.version_option()
File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 247, in decorator
_param_memo(f, OptionClass(param_decls, **option_attrs))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 2467, in __init__
super().__init__(param_decls, type=type, multiple=multiple, **attrs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 2108, in __init__
) from None
ValueError: 'default' must be a list when 'multiple' is true.
k
Yeah this issue was just fixed. It's because of version conflicts with
click
. The CLI package
m
so 0.14.19-python3.7 works now right?
k
No I don’t think so because the PR to set an upper bound on the click version didn’t get released yet