Hi all! I'm having issues with my Hasura pods. It ...
# prefect-server
g
Hi all! I'm having issues with my Hasura pods. It just keeps restarting every once in a while and I'm not sure about why this is happening. The values for the helm chart I'm using for Hasura are the following:
Copy code
hasura:
  image:
    name: hasura/graphql-engine
    tag: v1.3.3
    pullPolicy: IfNotPresent
    pullSecrets: []

  service:
    type: ClusterIP
    port: 3000

  labels: {}
  annotations: {}
  replicas: 2
  strategy: {}
  podSecurityContext: {}
  securityContext: {}
  env: []
  resources:
    limits:
      cpu: "500m"
      memory: "1Gi"
    requests:
      cpu: "100m"
      memory: "256Mi"
  nodeSelector: {}
  tolerations: []
  affinity: {}
both the pods crash at the same time and keep restarting a few times before it's back up, so my Server gets a significant amount of downtime. I don't think the resources are underestimated, but I do see a pattern on RAM usage when this happens (I've sent a screenshot). I've also attached some logs for the pods, but I can't find any relevant information on them. Is there anywhere else I could gather information for this issue?
k
I looked at the error, and I am not sure if it’s related to scaling down? Didn’t see any issues around those logs.
Wondering if going to hasura 2.0 might help you
Will see if the team has any other ideas
The guys who’d have a clue are out today unfortunately
a
I agree that the logs don't provide much useful information, especially because there is no error there. This is just an event log:
Copy code
unlocking events that are locked by the HGE
which doesn't indicate anything suspicious per se
the pods crash at the same time and keep restarting a few times before it's back up
I remember that your networking setup is quite involved. Given that Hasura pods are coming back up quickly again, it might be some transient networking issue in your Kubernetes service
btw I remember you promised a blog post or a code repository about your setup on Kubernetes - I'm counting on it! 😄 no pressure though, only if you find time to share your learnings
g
it might be some transient networking issue in your Kubernetes service
I did think about that, but the thing is none of the other stuff we host on this cluster went down, which is weird. I've scaled it down to a single replica and now it looks like it's stable, but I still don't understand why
btw I remember you promised a blog post or a code repository about your setup on Kubernetes
yeah! we'll work on it soon, I hope! I'll be glad to share it when it's ready!