<@ULVA73B9P> I am running a worker on kubernetes a...
# ask-marvin
j
@Marvin I am running a worker on kubernetes and calling a dbt cloud job. It reliably gives this error after one hour.
Copy code
Error occurred while streaming logs - Job will continue to run but logs will no longer be streamed to stdout.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_proto.py", line 92, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 508, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 1041, in _stream_job_logs
    async for line in logs.content:
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 52, in __anext__
    rv = await self.read_func()
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 352, in readline
    return await self.readuntil()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 386, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 347, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
m
The error you're encountering,
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
, is a known issue when running long flows or jobs in Prefect, particularly with Kubernetes workers. This error typically arises from a timeout or connection issue while streaming logs, and it doesn't stop the execution of your flow, but it does cut off log streaming. Here are some insights and potential solutions: 1. Timeouts and Keep-Alive: The error might be related to default timeouts in the
aiohttp
library or connection keep-alive settings. There was a related issue where enabling TCP keep-alive on the async Kubernetes client helped alleviate similar problems. You might consider looking into ensuring that your environment or custom Docker images have keep-alive settings enabled for connections. 2. Prefect and aiohttp Updates: Make sure you are using the latest versions of Prefect and its dependencies. There have been updates and discussions around this issue, and newer versions might have improvements or fixes. For example, a pull request added a
KeepAliveClientRequest
class to help with such issues. 3. Logging and Debugging: Increase the logging level to
DEBUG
to collect more detailed logs, which might help identify the source of the timeout. You can set this via the Prefect configuration or environment variables. 4. Kubernetes Configuration: Check your Kubernetes job and worker configurations. Ensure that the worker has the correct permissions and that there are no network policies or firewall rules that might interfere with long-running connections. For more detailed troubleshooting, you might want to refer to these resources: - Prefect Cloud Troubleshooting - Kubernetes Worker Configuration If you continue facing issues, consider reaching out on Prefect's GitHub issues or community forums for further assistance.
j
@Marvin where can I find these timeouts? considering it is exactly one hour. It must be set somewhere.
m
The timeout you're experiencing might be related to specific settings within Prefect or the underlying libraries it uses. Here are some potential places where these timeouts could be configured: 1. Prefect Settings: - The
server.database.sqlalchemy.pool_recycle
setting has a default of 3600 seconds (1 hour), which could potentially relate to connection recycling. - The
pool_timeout
setting is set to 30 seconds by default, which controls how long to wait for a connection from the pool. 2. Heartbeat and Polling Intervals: -
fallback_heartbeat_interval_seconds
defaults to 30 seconds, which is used for online/offline evaluation if a worker's heartbeat interval is not set. -
deployment_last_polled_timeout_seconds
defaults to 60 seconds, marking a deployment as not ready if it hasn't been polled. These settings can be configured using environment variables or via the Prefect configuration file. For more details, you can refer to the Prefect settings reference. 3. Aiohttp Timeout Settings: If the issue is with
aiohttp
, you might want to look into connection timeout settings in your code or configuration. Aiohttp has various timeout parameters that can be set when making requests or establishing connections. To address the issue, you might want to experiment with adjusting these timeout settings, either by extending them or ensuring they align with your job's expected execution time.