https://prefect.io logo
Title
c

chara

12/13/2022, 1:50 PM
Hello I am getting the following error in job runs that take place in kubernetes. I guess that the error happens when the job fails to get the status of a task. After that error in the job, the task is being killed even though it was running without problems. I think that this error occurs when I run 3 or more jobs concurrently but this should not be a load to my nodes. Does anybody know what could be its cause and how could it be fixed?
Task 'TaskPreprocessing': Exception encountered during task execution!
Traceback (most recent call last):
  File "my-file.py", line 105, in run_job
    job.RunNamespacedJob(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/tasks.py", line 456, in method
    return run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/tasks/kubernetes/job.py", line 730, in run
    job = api_client_job.read_namespaced_job_status(
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api/batch_v1_api.py", line 1393, in read_namespaced_job_status
    return self.read_namespaced_job_status_with_http_info(name, namespace, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api/batch_v1_api.py", line 1480, in read_namespaced_job_status_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 239, in GET
    return self.request("GET", url,
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id': '<Audit-Id>', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '<X-Kubernetes-Pf-Flowschema-Uid>', 'X-Kubernetes-Pf-Prioritylevel-Uid': '<X-Kubernetes-Pf-Prioritylevel-Uid>', 'Date': 'Tue, 13 Dec 2022 11:02:40 GMT', 'Content-Length': '150'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"rpc error: code = Unavailable desc = transport is closing","code":500}