<@ULVA73B9P> What could be causing this error when...
# ask-marvin
r
@Marvin What could be causing this error when my kubernetes worker submits a flow run:
Copy code
Failed to submit flow run '545eeace-192d-4a4c-b577-9b1ecddde61c' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 440, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir, ca_cert_data)
OSError: [Errno 24] Too many open files

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.100.0.1', port=443): Max retries exceeded with url: /apis/batch/v1/namespaces/default/jobs (Caused by SSLError(OSError(24, 'Too many open files')))
m
The error message you're experiencing,
OSError: [Errno 24] Too many open files
, typically occurs when the application or process has reached the limit of file descriptors it can open simultaneously. This could be due to a misconfigured system or environment running out of resources. Possible reasons and solutions for this error include: 1. File Descriptor Limits on the Host Machine: - The operating system has a limit on the maximum number of file descriptors available. You can check and raise this limit using the
ulimit
command in Unix-based systems. - Check the current limits with:
ulimit -n
- To increase the limit, use:
ulimit -n <new_limit>
(requires appropriate permissions) 2. Excessive Connections: - Review the code and system for potential resource leaks such as opened files or network connections that are not being closed properly. 3. Kubernetes Configuration: - Ensure that the Kubernetes configuration (such as limits and requests for CPU and memory resources) is appropriately set to handle the workload. 4. Environment Issues: - Check if there are any known issues specific to the Prefect version you're using or the interaction with Kubernetes. To assist you further, I need to know which version of Prefect you are using. Can you provide that information, or should I search for potential issues and resolutions in both Prefect 2.x and 3.x?
r
@Marvin I'm getting this running on kubernetes. Restarting the coredns and worker deployments has helped resolve this issue, what could this mean?