https://prefect.io logo
Title
j

Josh Greenhalgh

01/27/2021, 5:27 PM
I am having a lot of trouble with liveness and readiness probes of the graphql and hasura deployments. This is the current state of my pods;
prefect-agent-55775d5684-qml7k     1/1     Running   0          5h21m
prefect-apollo-b5d9c4cd8-82txc     1/1     Running   1          28h
prefect-graphql-5f77cd4674-wmfbj   0/1     Running   21         22h
prefect-hasura-bf4dd5d95-z2j8x     0/1     Running   24         28h
prefect-job-e63a7de9-nwrxg         1/1     Running   0          20m
prefect-postgresql-0               1/1     Running   0          28h
prefect-towel-5dc84cb477-zx2rg     1/1     Running   0          28h
prefect-ui-8bd5ff9f8-2scsz         1/1     Running   0          22h
They keep on restarting and going into
CrashLoopBackOff
with errors like;
Warning  Unhealthy  9m39s (x45 over 22h)   kubelet  Readiness probe failed: Get <http://10.40.14.208:4201/health>: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  9m37s (x41 over 22h)   kubelet  Liveness probe failed: Get <http://10.40.14.208:4201/health>: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  5m18s (x35 over 109m)  kubelet  Liveness probe failed: Get <http://10.40.14.208:4201/health>: dial tcp 10.40.14.208:4201: connect: connection refused
Has anybody had similar problems and did they manage to solve them?
z

Zanie

01/27/2021, 5:51 PM
Hi @Josh Greenhalgh — I haven’t seen this before. What overrides do you have? Are the nodes large enough for them? Can you look at the logs for the pods?
j

Josh Greenhalgh

01/27/2021, 5:52 PM
Overrides are pretty basic;
agent:

  # enabled determines if the Prefect Kubernetes agent is deployed
  enabled: true
  prefectLabels: ['prefect-agent']

ui:
  apolloApiUrl: http://***:4200/graphql

postgresql:
  postgresqlPassword: ***

serviceAccount:
  name: "***"
very small nodes...
That may be it I suppose - 1cpu 4Gb
So looking in the helm chart can't find where each deployment specifies it's resource requirements - what should a minimum node size be?
m

Mariia Kerimova

01/27/2021, 7:07 PM
Hey Josh. I doubt that it's an issue with cpu and memory constraints, and you should be able run prefect server with that node size. are you able to post some pod logs?
j

Josh Greenhalgh

01/28/2021, 11:43 AM
@Mariia Kerimova I think the nodes were being shared with composer airflow so it wasn't getting anywhere close to the full nodes resources - going to try in an isolated node pool and see how that goes - if I still have the same issues will dig into the logs to share
👍 1