https://prefect.io logo
Title
p

Pedro Martins

12/14/2020, 5:23 PM
The PREFECT__CLOUD__AGENT__AGENT_ADDRESS was wrong on my yaml. Prefect Kubernetes agent pod failing liveness probe. I deployed a k8s agent using the deployment yaml provided by prefect. I updated the PREFECT_CLOUD__API to point to my own server. But the agent pod keeps restarting because of the probeness failure:
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  103s  default-scheduler  Successfully assigned default/prefect-agent-c4b68bfd9-dj8cf to ip-10-0-1-20.eu-west-1.compute.internal
  Normal   Pulling    102s  kubelet            Pulling image "prefecthq/prefect:0.13.19-python3.6"
  Normal   Pulled     101s  kubelet            Successfully pulled image "prefecthq/prefect:0.13.19-python3.6"
  Normal   Created    101s  kubelet            Created container agent
  Normal   Started    101s  kubelet            Started container agent
  Warning  Unhealthy  37s   kubelet            Liveness probe failed: Get <http://10.0.1.64:8080/api/health>: dial tcp 10.0.1.64:8080: connect: connection refused
Note that the agent can successfully register to the UI. deployment.yaml in the thread. Does anyone one know how to fix it?
1
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prefect-agent
  name: prefect-agent
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prefect-agent
  template:
    metadata:
      labels:
        app: prefect-agent
    spec:
      containers:
      - args:
        - prefect agent kubernetes start
        command:
        - /bin/bash
        - -c
        env:
        - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
          value: ''
        - name: PREFECT__CLOUD__API
          value: <http://prefect-server-apollo.default.svc.cluster.local>
        - name: NAMESPACE
          value: default
        - name: IMAGE_PULL_SECRETS
          value: ''
        - name: PREFECT__CLOUD__AGENT__LABELS
          value: '[]'
        - name: JOB_MEM_REQUEST
          value: ''
        - name: JOB_MEM_LIMIT
          value: ''
        - name: JOB_CPU_REQUEST
          value: ''
        - name: JOB_CPU_LIMIT
          value: ''
        - name: IMAGE_PULL_POLICY
          value: ''
        - name: SERVICE_ACCOUNT_NAME
          value: ''
        - name: PREFECT__BACKEND
          value: server
        - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          value: <http://localhost:8080>

        image: prefecthq/prefect:0.13.19-python3.6
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          httpGet:
            path: /api/health
            port: 8080
          initialDelaySeconds: 40
          periodSeconds: 40
        name: agent
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
s

Sagun Garg

12/21/2020, 7:45 AM
@Pedro Martins How can I modify this Yaml file to reflect this change. I am also receiving the same error
p

Pedro Martins

12/21/2020, 6:51 PM
My only change was make sure PREFECT__CLOUD__AGENT__AGENT_ADDRESS is the following.
- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
  value: http://:8080
Also, make sure you local prefect is pointing to your server see the details here https://github.com/PrefectHQ/server/tree/master/helm/prefect-server#connecting-to-your-server