after my flow completes successfully `[2020-02-04 ...
# prefect-community
r
after my flow completes successfully
[2020-02-04 05:17:43,587] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
I try and terminate the dask distrbuted cluster cleanly, but always get :
2020-02-04 05:17:44,467 INFO stopping Dask cluster
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms
distributed.scheduler - INFO - Remove worker <Worker '<tcp://100.96.120.2:37227>', name: <tcp://100.96.120.2:37227>, memory: 0, processing: 0>
distributed.core - INFO - Removing comms to <tcp://100.96.120.2:37227>
distributed.batched - INFO - Batched Comm Closed: in <closed TCP>: Stream is closed
distributed.scheduler - INFO - Remove worker <Worker '<tcp://100.96.120.2:43873>', name: <tcp://100.96.120.2:43873>, memory: 0, processing: 0>
followed by:
distributed.scheduler - INFO - Lost all workers
2020-02-04 05:17:45,604 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3856212470>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/logflow/pods?labelSelector=app%3Ddask%2Ccomponent%3Dworker%<http://2Cdask.org|2Cdask.org>%2Fcluster-name%3Ddask-root-b81ddee0-5%2Cuser%3Droot
2020-02-04 05:17:45,604 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f384f59eb38>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/logflow/pods?labelSelector=app%3Ddask%2Ccomponent%3Dworker%<http://2Cdask.org|2Cdask.org>%2Fcluster-name%3Ddask-root-b81ddee0-5%2Cuser%3Droot
2020-02-04 05:17:45,605 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f384f59e0b8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/logflow/pods?labelSelector=app%3Ddask%2Ccomponent%3Dworker%<http://2Cdask.org|2Cdask.org>%2Fcluster-name%3Ddask-root-b81ddee0-5%2Cuser%3Droot
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 994, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 300, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f3874be2fd0>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/weakref.py", line 624, in _exitfunc
f()
File "/usr/lib/python3.6/weakref.py", line 548, in __call__
return info.func(*info.args, **(info.kwargs or {}))
File "/usr/local/lib/python3.6/dist-packages/dask_kubernetes/core.py", line 623, in _cleanup_resources
pods = core_api.list_namespaced_pod(namespace, label_selector=format_labels(labels))
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 12372, in list_namespaced_pod
(data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 12472, in list_namespaced_pod_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 334, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 168, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 355, in request
headers=headers)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 205, in request
headers=headers)
File "/usr/local/lib/python3.6/dist-packages/urllib3/request.py", line 76, in request
method, url, fields=fields, headers=headers, **urlopen_kw
File "/usr/local/lib/python3.6/dist-packages/urllib3/request.py", line 97, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py", line 330, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 760, in urlopen
**response_kw
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 760, in urlopen
**response_kw
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 760, in urlopen
**response_kw
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 436, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='localhost', port=443): Max retries exceeded with url: /api/v1/namespaces/logflow/pods?labelSelector=app%3Ddask%2Ccomponent%3Dworker%<http://2Cdask.org|2Cdask.org>%2Fcluster-name%3Ddask-root-b81ddee0-5%2Cuser%3Droot (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3874be2fd0>: Failed to establish a new connection: [Errno 111] Connection refused',))
is there something specific I need to do to properly tear down the dask cluster?
c
Hi Ryan; how are you tearing down your dask cluster right now? It’s not entirely surprising to see errors during a teardown, but if you want to avoid them I’d recommend starting by tearing down all workers and then the scheduler
r
KubeCluster.close()