02/03/2023, 6:51 AM
Hi everyone, I am using dask local executor to run my flow with 8 threads
task_runner=DaskTaskRunner(cluster_class=distributed.LocalCluster, cluster_kwargs={"n_workers": 8})
When I run the same code in jupyter notebook (as normal functions), tasks are getting completed in <1 minute whereas it's taking over 10 minutes to run in prefect with above task_runner. Can someone help with what might be wrong?
I tried using the default Concurrent task runner as well, task gets completed after way long that it actually takes to run. I had 2 queries both of which using SQLAlchemy blocks for db connections, after the first query finishes, next step is to run the second query. But there was a difference of 1 minute between first query completion and second query starting. Is there anything I am missing out in some configuration? And not just this, queries are taking longer to run that running the task as a function in jupyter
I am using KuberntestJob and this issue only happens there, I tried running the flow from jupyter with same env and task completes faster but there is still a 30 seconds lag between queries
I switched back to dask and checked logs in k8s pod and got this. Is this something related to prefect?
14:41:17.588 | INFO    | distributed.core - Event loop was unresponsive in Worker for 4.40s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.

Christopher Boyd

02/03/2023, 3:37 PM
Concurrent is not true concurrency, it’s async unless it becomes blocked, so that checks out
For Dask, it’s in distributed.core - which is the dask module. It depends on what kind / size of data you are passing around


02/04/2023, 8:25 AM
I am passing pandas dataframes, both large (over 10M rows x 5 columns) and small (0.1M x 2) takes long time to free up the dask executors