https://prefect.io logo
#prefect-community
Title
# prefect-community
s

Sven Teresniak

07/15/2020, 4:38 PM
i have a local prefect test cluster with a single dask-scheduler and dask-worker doing the work. i submit jobs by calling
flow.register()
and its working fine. i see the work in the dask-ui and in the prefect-ui as well. now the question: is there a easy way to modularize a flow? to allow imports of local modules (e.g. boilerplate code) to re-use code in different flows? in spark i can bundle a boilerplate.jar and publish the jar together with the job for a single job run (or the same jar for every job). is there any mechanism in prefect? do i have to deploy the boilerplate code to my (every) dask-worker in advance?
j

josh

07/15/2020, 4:52 PM
Hi @Sven Teresniak yes when using Dask any importable module that your Flow needs to run must be present/available to all workers. We do have some resources around using something like DaskKubernetes and Docker storage to bundle up your flow/dependencies into an image that is then available to all workers. For your local use case you will need to make sure wherever the Dask workers are running they are able to import whichever modules the flow uses (I believe generally having them available on the
PYTHONPATH
)
s

Sven Teresniak

07/15/2020, 5:04 PM
Okay, I see. The problem is, that I cannot spawn docker container or pods. My cluster is running in K8S, but for security reasons prefect has no permission to communicate with kubernetes master. I see the benefits of generating images and then run images. its much easier to isolate dependencies and more scalable in the cloud. However, in my scenario I can still scale the dask worker to achive more parallelism where needed. The good thing is that our jobs are cpu friendly because in 99% of the time our prefect flows will only use presto and spark for the heavy lifting.
So, to be a bit flexible I could mount EFS/NFS in every dask-worker pod and deploy boilerplate code this way (with some PYTHONPATH magic).
Or integrate the boilerplate into the dask-worker-image itself
@josh now the question: is it possible to let dask-worker processes (not pods, just the python prcoesses) die after some time of inactivity? Because when I preload boilerplate.py in the dask worker (there is a command line argument) OR import boilerplate.py in my flow file (during execution in the dask worker) I cannot easily un-import it. Thats not easy in Python. Thus, a single process for a single flow (or task?) would be great
As an alternative I could restart the dask worker after deployment of new boilerplate
j

josh

07/15/2020, 5:13 PM
Hmm 🤔 looks like there are a few worker options depending on what you want https://docs.dask.org/en/latest/setup/cli.html#dask-worker such as
Copy code
--death-timeout
--lifetime
--lifetime-restart
s

Sven Teresniak

07/15/2020, 6:17 PM
Hmmm I found the documentation but nothing states something like "kill worker after idle time of n seconds". And
Seconds to wait for a scheduler before closing
doesn't sound like
Seconds to wait AFTER a SCHEDULED JOB before closing
Maybe try&error will help. 🙂
thanks