Dnyaneshwar
07/10/2020, 1:38 PMDaskExecutor with YarnCluster.
This gives me KilledWorker error. I am not able to log more data as debug=True option also doesn't add more information.
However, when I try the same tasks on DaskExecutor with address=None, I do not get any error.
What am I missing?nicholas
nicholas
KilledWorker sounds like something misconfigured on the Hadoop endDnyaneshwar
07/10/2020, 1:51 PMaddress=None the flow executes as expected without any errors or warnings. (I really liked the way logs were structured, so I could actually look how each worker performed. Thanks 🙂)
When I use the same YarnCluster in python (without any flow), it runs as expected. However, before even the tasks are mapped, I get the KilledWorker error.nicholas
Dnyaneshwar
07/10/2020, 2:21 PMself.client.gather() function with the argument asynchronousJim Crist-Harif
07/10/2020, 6:07 PMYarnCluster outside of prefect? You might try:
from dask_yarn import YarnCluster
from dask.distributed import Client
cluster = YarnCluster(...) # Create a cluster, with whatever configuration you want
cluster.scale(1) # Scale to one worker
client = Client(cluster)
client.submit(lambda x: x + 1, 1).result() # Should return 2Jim Crist-Harif
07/10/2020, 6:09 PMI am not able to log more data asTheoption also doesn't add more information.debug=True
debug option only applies when running with a local cluster (which happens if address=None ). It would be helpful to get the logs from the failed cluster. You can do this with:
yarn logs -applicationId <your cluster application id>Dnyaneshwar
07/12/2020, 2:41 PMclient.map() and as_completed() from dask to map and reduce the tasks.
It is working without any error.
Whenever I use the YarnCluster inside Prefect, I am getting KilledWorker error.Jim Crist-Harif
07/12/2020, 3:15 PMDnyaneshwar
07/14/2020, 12:23 PMERROR:prefect.FlowRunner:Unexpected error: TypeError("can't pickle _mysql_connector.MySQL objects",) .
This error I am getting for both DaskExecutor with address=None and YarnCluster.Jim Crist-Harif
07/14/2020, 1:11 PMMySQL object from one task, probably as a database connection used by other tasks? One option would be to recreate the connection in every task that needs it, and close the connection after use. We're still working on good patterns for these use cases.