Cooper Marcus
10/09/2021, 3:20 PM04:24:37
WARNING
CloudFlowRunner
Error getting flow run info
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connection.py", line 416, in connect
self.sock = ssl_wrap_socket(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/home/ec2-user/SageMaker/.pyenv/versions/3.8.6/lib/python3.8/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/home/ec2-user/SageMaker/.pyenv/versions/3.8.6/lib/python3.8/ssl.py", line 1040, in _create
self.do_handshake()
File "/home/ec2-user/SageMaker/.pyenv/versions/3.8.6/lib/python3.8/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1124)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 783, in urlopen
return self.urlopen(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 783, in urlopen
return self.urlopen(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 783, in urlopen
return self.urlopen(
[Previous line repeated 3 more times]
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/urllib3/util/retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<http://api.prefect.io|api.prefect.io>', port=443): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1124)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/prefect/engine/cloud/flow_runner.py", line 188, in interrupt_if_cancelling
flow_run_info = self.client.get_flow_run_info(flow_run_id)
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 1145, in get_flow_run_info
result = self.graphql(query).data.flow_run_by_pk # type: ignore
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 298, in graphql
result = <http://self.post|self.post>(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 213, in post
response = self._request(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 459, in _request
response = self._send_request(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/prefect/client/client.py", line 351, in _send_request
response = <http://session.post|session.post>(
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/ec2-user/SageMaker/PGE-Dx-Risk/.venv/lib/python3.8/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='<http://api.prefect.io|api.prefect.io>', port=443): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1124)')))
Kevin Kho
10/09/2021, 3:40 PMMarko Jamedzija
10/11/2021, 4:33 PM[2021-10-10 19:10:51+0000] WARNING - prefect.CloudFlowRunner | Error getting flow run info
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1373, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 319, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 280, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 337, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='prefect-apollo.prefect', port=4200): Read timed out. (read timeout=15)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/prefect/engine/cloud/flow_runner.py", line 188, in interrupt_if_cancelling
flow_run_info = self.client.get_flow_run_info(flow_run_id)
File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 1562, in get_flow_run_info
result = self.graphql(query).data.flow_run_by_pk # type: ignore
File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 554, in graphql
retry_on_api_error=retry_on_api_error,
File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 458, in post
retry_on_api_error=retry_on_api_error,
File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 738, in _request
session=session, method=method, url=url, params=params, headers=headers
File "/usr/local/lib/python3.7/site-packages/prefect/client/client.py", line 606, in _send_request
timeout=prefect.context.config.cloud.request_timeout,
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 590, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
I have started having situations like these quite often.. The flow executes only 3 RunNamespacedJob
tasks in parallel. The first two finish and the last one doesn’t get a confirmation that is done from time to time and gets stuck, even though the underlying k8s job finished. Maybe I need to increase this read timeout?Kevin Kho
10/11/2021, 4:41 PMMarko Jamedzija
10/11/2021, 4:42 PMKevin Kho
10/11/2021, 4:43 PMMarko Jamedzija
10/11/2021, 4:43 PMKevin Kho
10/11/2021, 4:47 PMprefect.context.config.cloud.request_timeout
Marko Jamedzija
10/11/2021, 4:55 PMPREFECT__CONTEXT__CONFIG__CLOUD__REQUEST_TIMEOUT
? Sorry I haven’t set any of these so far 🙂 I’m not sure if I should put CONFIG
, this doc is a bit ambiguous 🙂Kevin Kho
10/11/2021, 4:55 PMMarko Jamedzija
10/11/2021, 4:56 PMPREFECT__CLOUD__REQUEST_TIMEOUT
🙂 I tested it and the value is set correctly. I’ll see if it helps to mitigate the problem 🙂Kevin Kho
10/12/2021, 12:44 PMMarko Jamedzija
10/12/2021, 2:52 PMRunNamespacedJob
was stuck in running state even though the underlying pod finished long time ago. And these warnings started coming like 5 hours after that, so it’s not the cause of an issue.Kevin Kho
10/12/2021, 4:09 PMMarko Jamedzija
10/13/2021, 10:42 AMLocalDaskExecutor("processes")
for executing these 3 RunNamespacedJobs
tasks in parallel 🙂 Do you know how I could check this (unclosed connections)? Thanks 🙂Kevin Kho
10/13/2021, 4:59 PMMarko Jamedzija
10/14/2021, 8:56 AM