https://prefect.io logo
Title
s

Samuel Kohlleffel

01/23/2023, 6:22 PM
Issue: We just received an error where our Prefect agent failed to submit a flow run. The agent continued to submit all flow runs after this specific one. Error details in thread. Prefect Version: 2.7.8 Prefect Agent: AKS
17:59:40.136 | ERROR   | prefect.agent - Failed to submit flow run '0a11c778-7abf-4368-bf4b-b55d73f267fa' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/agent.py", line 424, in _submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 277, in run
    job = await run_sync_in_worker_thread(self._create_job, manifest)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 622, in _create_job
    job = batch_client.create_namespaced_job(self.namespace, job_manifest)
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job
    return self.create_namespaced_job_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/batch_v1_api.py", line 309, in create_namespaced_job_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 391, in request
    return <http://self.rest_client.POST|self.rest_client.POST>(url,
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 276, in POST
    return self.request("POST", url,
  File "/usr/local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 169, in request
    r = self.pool_manager.request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
  File "/usr/local/lib/python3.10/site-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/usr/local/lib/python3.10/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
17:59:40.144 | INFO    | prefect.agent - Completed submission of flow run '0a11c778-7abf-4368-bf4b-b55d73f267fa'
17:59:42.045 | WARNING | urllib3.connectionpool - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /api/v1/namespaces/default/pods/papaya-cockatoo-9ckbx-r7gsg/log?follow=True
Shortly before this issue, one of our Prefect jobs crashed as well.
17:54:01.860 | INFO    | prefect.agent - Submitting flow run 'c138b823-54b0-463e-baec-a363870cafcf'

17:54:02.365 | INFO    | prefect.infrastructure.kubernetes-job - Job 'spry-limpet-764c2': Pod has status 'Pending'.
17:54:02.428 | INFO    | prefect.agent - Completed submission of flow run 'c138b823-54b0-463e-baec-a363870cafcf'
17:55:02.362 | ERROR   | prefect.infrastructure.kubernetes-job - Job 'spry-limpet-764c2': Pod never started.
17:55:02.495 | INFO    | prefect.agent - Reported flow run 'c138b823-54b0-463e-baec-a363870cafcf' as crashed: Flow run infrastructure exited with non-zero status code -1.