Hi Everyone we have a prefect server running on Kubernetes s Prefect Community #ask-community

Hi Everyone, we have a prefect server running on K...

Omar Sultan

03/20/2022, 9:27 AM

Hi Everyone, we have a prefect server running on Kubernetes, setup was done using the HELM Chart. Everything is running smoothly but occasionally we would get this error

Copy code

File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 341, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='prefect-apollo.prefect', port=4200): Read timed out. (read timeout=15)

This happens especially when we use the Task StartFlowRun it does not happen very often , but I was wondering if there was a way to force retry or if anyone knows why this would be happening? Thanks

discourse 1

Anna Geller

03/20/2022, 12:44 PM

This error means that some action on the backend took too long to respond to the API request. E.g. it could be that the task run state updates are taking too long. Here is how you can increase the read timeout on Server: • https://discourse.prefect.io/t/is-it-possible-to-increase-the-graphql-api-request-timeout/219 • https://discourse.prefect.io/t/how-to-check-a-graphql-query-timeout-settings-on-server-how-to-increase-that-timeout-value/496 To set retry on the StartFlowRun task, you can do:

Copy code

from datetime import timedelta
from prefect import Flow
from prefect.tasks.prefect import StartFlowRun

start_flow_run = StartFlowRun(project_name="PROJECT_NAME", wait=True, max_retries=3, retry_delay=timedelta(minutes=5))

with Flow("FLOW_NAME") as flow:
    staging = start_flow_run(flow_name="child_flow_name")

Omar Sultan

03/20/2022, 4:00 PM

Thank you so much for that

👍 1

Omar Sultan

03/21/2022, 5:51 AM

Hey Anna, quick followup , so I applied the change in the HELM Values and apply the configuration. And I can now see the env variable being assigned to the apollo pods , however when I print

Copy code

prefect.context.config.cloud.request_timeout

from any task that runs on the server it still shows 15, is there anything else I need to apply? do I need to restart the agent pod for example?

Anna Geller

03/21/2022, 10:35 AM

You may restart the Apollo pod so that it can pick up the changes. But not sure what the max allowed value is here, what did you set?

Omar Sultan

03/23/2022, 5:42 AM

I set it at 60 , I believe the document said that the max allowed value was 60

Anna Geller

03/23/2022, 8:53 AM

I see. So if this doesn't help I'm afraid we have to find out the root cause for those timeouts :) can you share your flow with StartFlowRun that times out? Could it be that the child flow run gets stuck and the parent flow run polling for its status at some point times out?

Omar Sultan

03/23/2022, 9:23 AM

Hey Anna, so i've been doing some investigations and it seems that this error only happens right around the same time the Hasura pod tries to submit telemetry. Because I am working in a closed environment without internet access this operation is timing out and it seems its causing the flows to timeout as well. Not sure if that makes anysense. I can see from Hasura documentaiton that telemetry can be disabled, however, not sure how i can do that in the helm values file. Any ideas?

Anna Geller

03/23/2022, 10:11 AM

Nice work finding that out! There is an easy way to disable it, check out this page: https://docs.prefect.io/orchestration/server/telemetry.html

🙏 1

Anna Geller

03/23/2022, 10:21 AM

On a Hełm chart you should be able to set an environment variable name: PREFECT__SERVER__TELEMETRY__ENABLED value: false

Omar Sultan

03/23/2022, 1:41 PM

So I followed the link above and it disabled telemetry for prefect and reflected correctly. But I believe the issue is coming still from the Hasura image itself, it needs to have an env variable in the pod called HASURA_GRAPHQL_ENABLE_TELEMETRY set as false .. I'm looking at the vlaues.yaml of the helm chart but can't find anywhere to pass env variables to the service

Omar Sultan

03/23/2022, 1:41 PM

sorry to keep bugging you with this

Anna Geller

03/23/2022, 1:47 PM

I have a meeting with a Prefect employee who knows more about this telemetry thing in just 10 min 😄 will ask and update you afterwards

Omar Sultan

03/23/2022, 1:47 PM

WoW nice thank yo u so much 🙂

Anna Geller

03/23/2022, 9:53 PM

@Omar Sultan sorry for getting back to you a little later than planned: I check and this is part of the code where the telemetry info is set https://github.com/PrefectHQ/server/blob/master/services/apollo/src/index.js#L19 You can see here that the environment variable

PREFECT_SERVER__TELEMETRY__ENABLED

is the right one. I seemed to have mistyped the underscores (only one underscore between PREFECT and SERVER), which may be the reason why this didn't work for you. Also, once you set that, you may need to restart the Apollo pod to make sure the changes get applied. To set it on the Helm chart, see this option, e.g.:

Copy code

helm upgrade \
    $NAME \
    prefecthq/prefect-server \
    --set apollo.options.telemetryEnabled=true

162 Views

Open in Slack

Previous Next