Ron Meshulam
04/26/2022, 12:16 PMrequests.exceptions.ReadTimeout: HTTPConnectionPool(host='prefect-apollo.prefect', port=4200): Read timed out. (read timeout=15)
I thought this was a scale issue so I've replicated the services as follows:
• agent: 3
• UI: 2
• apollo: 3
• graphql: 2
• hasura: 2
• towel: 2
we are using an external Postgress (GCP managed)
I've seen in earlier massages that I should configure:
1. PREFECT__CLOUD__REQUEST_TIMEOUT = 60 (configured it on the env in the apollo pod)
2. PREFECT_SERVER__TELEMETRY__ENABLED = false (configured it on the env in the agent pod)
3. PREFECT__CLOUD__HEARTBEAT_MODE = thread (configured it on the env in the agent pod)
I've attached the values.yaml
Can anyone help me what else can I do/ what could be the problem?Anna Geller
Ron Meshulam
04/26/2022, 1:12 PMAnna Geller
Ron Meshulam
04/26/2022, 1:14 PMHTTPConnectionPool(host='prefect-apollo.prefect', port=4200): Read timed out. (read timeout=15)
it doesn't need to be 60?Kevin Kho
Ron Meshulam
04/26/2022, 2:33 PMKevin Kho
--env
flag so you can try passing it to the flow through the RunConfig.
flow.run_config = KubernetesRun(..., env={"PREFECT__CLOUD__REQUEST_TIMEOUT": 60})
Ron Meshulam
04/26/2022, 2:40 PMKevin Kho
Ron Meshulam
04/26/2022, 2:45 PMAnna Geller
sometimes it's working and sometimes not.my favorite types of problems 😂
Ron Meshulam
04/26/2022, 2:50 PMAnna Geller
Ron Meshulam
04/26/2022, 2:51 PMAnna Geller
Ron Meshulam
04/26/2022, 3:00 PMAnna Geller
Ron Meshulam
04/26/2022, 3:33 PMKhen Price
04/26/2022, 3:37 PMKevin Kho
Ron Meshulam
04/26/2022, 7:35 PMdavzucky
04/26/2022, 10:18 PMagent: 3
UI: 2
apollo: 3
graphql: 2
hasura: 2
towel: 2
• Are all these pods behind a K8s Service to ensure the routing is properly done?
• Are you agent connecting public Url or local k8s URL ?Ron Meshulam
04/27/2022, 6:56 AMAnna Geller
Ron Meshulam
04/28/2022, 7:28 AMAnna Geller
HTTPConnectionPool(host='prefect-apollo.prefect', port=4200): Read timed out. (read timeout=15)
it could be some network latency issue. Can you say more where do you run this and what's your networking setup?Ron Meshulam
04/28/2022, 12:57 PMrequest to <http://prefect-hasura.prefect:3000/v1alpha1/graphql> failed, reason: connect ECONNREFUSED 10.10.11.76:3000
so I'm trying to replicate the GraphQL pod because I think that he can't handle everything. It seems like he can't handle the workload. (After that I'm planning to replicate the Hasura pod also).
Can you share your thoughts about how and when to scale up the GraphQL and Hasura pods? Doe's they have some metrics that I can look at them in order to auto-scale them?Kevin Kho
Anna Geller
Can you share your thoughts about how and when to scale up the GraphQL and Hasura pods? Doe's they have some metrics that I can look at them in order to auto-scale them?hard to say; perhaps you could somehow measure the CPU and network utilization of the underlying instance? this is how e.g. AWS seems to trigger autoscaling policies
Production - 2021.07.06
Testing - 2022.04.14
In production env, we are not getting these errors. And it's running perfectly (for now - we had the same workload a few days ago with 12 executions in parallel).This error actually implies that something may be wrong in your Hasura setup, since the testing version is using Hasura 2.0. I'd definitely cross-check that. I've added some ideas for debugging similar issues here - perhaps those can help you too - keep us posted how it goes!
Ron Meshulam
05/01/2022, 8:30 AM