Hey all! Has anyone attempted or ran into issues u...
# ask-community
c
Hey all! Has anyone attempted or ran into issues using the default spinning up of the local dask cluster (
DaskTaskRunner
) within a google cloud run job? I seem to be running into some issues with the execution of the tasks that are using dask. I tested this out locally with no issues but when using default dask within cloud run i run into some problems. This may also be a bad choice to do for reasons unknown to me at the moment! A better approach may be to actually setup a dask cluster separately but just trying this approach for now… For the google cloud run testing, the tasks that are not using dask run fine. Moreover, if i switch to the sequential task runner all tasks work fine.
j
hey, could you describe the problems/errors some more? I am not familiar with specifically running dask on cloud run but off the top of my head might have to do with the type of machine being used or missing dependencies?
c
hey @Jake Kaplan One issue i’ve ran into is that the tasks that are harnessing dask are running into module import errors despite them working when they are ran via sequential task runner. The modules are our own custom internal modules within the same package.
I think there is the potential that this has to do with the environment variable of the cloud run job: setting
PYTHONPATH
(via cloud run job work pool configuration) is not passed over into dasks execution space? I’m not entirely sure how dask works completely underneath the hood. @Jake Kaplan
j
If you set it on the work pool it should be set as an ENV on your cloud run job (you can confirm in the cloud run job console that your env var is set on the container)
I do think you're on the right track that the module isn't in your python path
c
yup! it is.
However, this error doesn’t occur when using the sequential task runner in the cloud job- the specific task no longer causes this error. This lead me to think there might be something going on within each parallel job in dask for that specific task.
Moreover, if i execute this locally with dask (within the python venv), no issues with those tasks either in terms of module import errors. It only seems to occur when using dask with cloud run. The small difference so far that has the potential to be the issue is that
PYTHONPATH
not being relevant for each parallel job. I could be way off track here though.
j
FWIW I think you would have this issue w/o prefect and just using Dask in cloud run (not sure if that helps you test). I found some related issues on SO
c
ahh interesting. Thanks! I’ll see if adding it to the image itself will make a difference- hopefully!