Hey can anyone help me understand this ```NAME READY STATUS Prefect Community #prefect-server

Hey can anyone help me understand this; ```NAME ...

Josh Greenhalgh

03/09/2021, 1:09 PM

Hey can anyone help me understand this;

Copy code

NAME                                     READY   STATUS    RESTARTS   AGE
prefect-agent-c58b946f9-9r59j            1/1     Running   280        21h
prefect-server-apollo-78c9b8cbfb-bd69r   1/1     Running   0          3d12h
prefect-server-graphql-875f7ddc-pntjp    1/1     Running   0          3d12h
prefect-server-hasura-7897f76bcf-mtphx   1/1     Running   0          3d12h
prefect-server-towel-6d9c9748f4-q9mrc    1/1     Running   0          3d12h
prefect-server-ui-55f4bcb597-mmz4c       1/1     Running   0          3d12h

The agent consistently restarts every 5 mins or so - is this expected? If not any idea how to solve? I am using the output of

prefect agent kubernetes install

as the spec for the deployment

Mariia Kerimova

03/09/2021, 1:25 PM

Hello Josh! So, there are couple reasons which can trigger pod restarts. Can you provide following information: What version of Prefect are you using? Can you describe the pod and share events from the pod? (run

kubectl describe po prefect-agent-c58b946f9-9r59j

) Do you set memory limits on the agent?

Josh Greenhalgh

03/09/2021, 2:26 PM

version: prefecthq/prefect:0.14.11-python3.8 describe;

Copy code

Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Normal   Pulled     33m (x291 over 22h)     kubelet  Successfully pulled image "prefecthq/prefect:0.14.11-python3.8"
  Warning  Unhealthy  13m (x590 over 22h)     kubelet  Liveness probe failed: Get <http://10.32.1.19:8080/api/health>: dial tcp 10.32.1.19:8080: connect: connection refused
  Warning  BackOff    8m47s (x3591 over 22h)  kubelet  Back-off restarting failed container
  Normal   Pulling    3m39s (x298 over 22h)   kubelet  Pulling image "prefecthq/prefect:0.14.11-python3.8"

memory limits: nope

Josh Greenhalgh

03/09/2021, 2:28 PM

This is full def using terrafrom k8s provider (if it helps);

Copy code

resource "kubernetes_deployment" "prefect_agent" {
  metadata {
    name      = "prefect-agent"
    namespace = kubernetes_namespace.prefect.metadata.0.name
    labels = {
      app = "prefect-agent"
    }
  }

  spec {
    replicas = 1

    selector {
      match_labels = {
        app = "prefect-agent"
      }
    }

    template {
      metadata {
        labels = {
          app = "prefect-agent"
        }
      }

      spec {
        node_selector = {
          "<http://cloud.google.com/gke-nodepool|cloud.google.com/gke-nodepool>" = google_container_node_pool.fixed_compute.name
        }
        container {
          name    = "agent"
          image   = "prefecthq/prefect:0.14.11-python3.8"
          command = ["/bin/bash", "-c"]
          args    = ["prefect agent kubernetes start"]

          env {
            name  = "PREFECT__CLOUD__API"
            value = "http://<HIDDEN>:4200/graphql"
          }

          env {
            name  = "NAMESPACE"
            value = "prefect"
          }

          env {
            name  = "PREFECT__CLOUD__AGENT__LABELS"
            value = "['prefect-agent']"
          }

          env {
            name  = "PREFECT__BACKEND"
            value = "server"
          }

          resources {
            limits = {
              cpu    = "100m"
              memory = "128Mi"
            }
          }

          liveness_probe {
            http_get {
              path = "/api/health"
              port = "8080"
            }

            initial_delay_seconds = 40
            period_seconds        = 40
            failure_threshold     = 2
          }

          image_pull_policy = "Always"
        }
      }
    }
  }
}

Zanie

03/09/2021, 3:01 PM

I believe you need to set the local address for the health check server to run ie https://github.com/PrefectHQ/server/blob/master/helm/prefect-server/templates/agent/deployment.yaml#L68-L69

Josh Greenhalgh

03/09/2021, 3:54 PM

hmmm - ok I removed that since it appeared to be related to

cloud

version?

Josh Greenhalgh

03/09/2021, 3:55 PM

Thanks!

Zanie

03/09/2021, 4:00 PM

There are a few instances where

CLOUD

settings are just "backend" settings that we didn't have a better name for when Server was split out.

Josh Greenhalgh

03/09/2021, 4:04 PM

Ok so the issue is that that env var is required to set up some endpoint that the healthcheck probes? In my case there is nothing there so it keeps failing?

3 Views

Open in Slack

Previous Next