Hello wave I have an ETL Flow using a LocalDaskExecutor that Prefect Community #prefect-server

Hello :wave: I have an ETL Flow, using a LocalDas...

Jovan Sakovic

02/16/2022, 2:34 PM

Hello 👋 I have an ETL Flow, using a LocalDaskExecutor, that maps the API extraction function across a list of thousands of parameters to query the API endpoint with. It’s running open source on a VM instance. It’s all running okay, extracts and loads into Snowflake, until the error in the thread reply occurs once. Then it just spirals down from there, slowly erroring out more frequently.

Jovan Sakovic

02/16/2022, 2:34 PM

Copy code

Error getting flow run info
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.8/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/local/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.8/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='apollo', port=4200): Read timed out. (read timeout=15)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/flow_runner.py", line 188, in interrupt_if_cancelling
    flow_run_info = self.client.get_flow_run_info(flow_run_id)
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1148, in get_flow_run_info
    result = self.graphql(query).data.flow_run_by_pk  # type: ignore
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 298, in graphql
    result = <http://self.post|self.post>(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 213, in post
    response = self._request(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 459, in _request
    response = self._send_request(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 351, in _send_request
    response = <http://session.post|session.post>(
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='apollo', port=4200): Read timed out. (read timeout=15)

Jovan Sakovic

02/16/2022, 2:37 PM

Did see this thread, but not sure what to make of it for this specific case (not using cloud, so setting the timeout in config wouldn’t make sense)

Kevin Kho

02/16/2022, 2:37 PM

Seems like the concurrent requests are potentially causing a bottleneck in the API? You could try bumping up the resources for that or you could try the thread you linked. Stephan is on Server too.

Kevin Kho

02/16/2022, 2:38 PM

It does say Cloud but it’s applicable to Server

Jovan Sakovic

02/16/2022, 2:41 PM

bottleneck in which API? the error points to Apollo? or you’re thinking the API that I’m querying in the Flow? Resources as in memory of the VM?

Kevin Kho

02/16/2022, 2:53 PM

I mean the container/pod apollo runs on (Prefect API). Yes memory/CPU of the VM

🙌 1

3 Views

Open in Slack

Previous Next