Scott Zelenka

    Scott Zelenka

    2 years ago
    Has anyone had success registering the Prefect Agent on a Private Subnet on AWS EKS? We can get the Pod running, and will register with Prefect Cloud to pick up pending Flows just fine. The challenge we're running into, is that the
    agent
    container is failing liveness probes, and killing the Pod.
    Warning Unhealthy 43s (x2 over 83s) kubelet, .internal Liveness probe failed: Get <http://10.0.28.4:8080/api/health> : dial tcp 10.0.28.4:8080: connect: connection refused
    I'm chatting with AWS right now, and they're asserting that because kubelet is attempting to connect on port 8080, that the Deployment YAML for
    agent
    should be exposing that. But I don't have this problem when deploying the same YAML on GCP, OpenShift, or bare metal K8.
    j

    josh

    2 years ago
    I have not run into this before but have you tried updating the port in the deployment yaml to see if it takes?
    - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
      value: "http://:8080"
    Don’t know enough for certain as to why EKS is talking to that port and other k8s providers aren’t
    Scott Zelenka

    Scott Zelenka

    2 years ago
    The latest theory is that it appears the liveness probe is listening on only the localhost:
    netstat -antlp | grep 8080
    tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      1/python
    So perhaps the other K8 deployments are issuing the liveness probe to
    localhost
    while AWS' EKS uses their VPC subnets for Pods and is attempting to access the actual IP of the Pod at port 8080 .. and because the service doesn't appear to be listening on all addresses, it's rejected
    I'm on prefect version 0.12.6 ... is there a way to update
    PREFECT__CLOUD__AGENT__AGENT_ADDRESS
    with the IP from the Pod?
    j

    josh

    2 years ago
    Hmm not one directly but using k8s you may be able to achieve it by setting something like:
    - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
    (from https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/) I also wonder if you can set the liveness probe host directly:
    livenessProbe:
                httpGet:
                  host: pod_ip_here
                  path: /api/health
                  port: 8080
                initialDelaySeconds: 40
                periodSeconds: 40
                failureThreshold: 2
    Scott Zelenka

    Scott Zelenka

    2 years ago
    Thanks! I looked at
    PREFECT__CLOUD__AGENT__AGENT_ADDRESS
    on the other K8 clusters, and it seems they were set to
    http://:8080
    which appears to listen on all addresses of the Pod. For whatever reason, when I ran
    prefect agent install kubernetes ...
    for the AWS deployment, it changed that value to
    <http://127.0.0.1:8080>
    . Manually changing this to
    <http://0.0.0.0:8080>
    allowed it to pass the liveness probes!!
    Sagun Garg

    Sagun Garg

    1 year ago
    @Scott Zelenka How can I manually change these values for
    PREFECT__CLOUD__AGENT__AGENT_ADDRESS
    can you please share the commands you used on the CLI to make this happen in the yaml file?
    @Scott Zelenka Here is what I get when I describe my pod
    Name:         prefect-agent-7bcdb4f975-pftnw
    Namespace:      default
    Priority:       2000001000
    Priority Class Name: system-node-critical
    Node:         fargate-ip-10-0-231-193.ap-southeast-1.compute.internal/10.0.231.193
    Start Time:      Mon, 21 Dec 2020 14:37:38 +0800
    Labels:        app=prefect-agent
    <http://eks.amazonaws.com/fargate-profile=fp-default|eks.amazonaws.com/fargate-profile=fp-default>
               
    pod-template-hash=7bcdb4f975
    Annotations:     CapacityProvisioned: 0.25vCPU 0.5GB
               
    Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
    <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
    Status:        Running
    IP:          10.0.231.193
    IPs:
     
    IP:      10.0.231.193
    Controlled By: ReplicaSet/prefect-agent-7bcdb4f975
    Containers:
     
    agent:
      
    Container ID: <containerd://9becbcb74d4ebc66c0c93a0fd40f5e3a15ee0c276831457c1514c752188b5c2>1
      
    Image:     prefecthq/prefect:0.14.0-python3.6
      
    Image ID:   <http://docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0|docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
      
    Port:     <none>
      
    Host Port:   <none>
      
    Command:
       
    /bin/bash
       
    -c
      
    Args:
       
    prefect agent kubernetes start
      
    State:     Waiting
       
    Reason:    CrashLoopBackOff
      
    Last State:   Terminated
       
    Reason:    Error
       
    Exit Code:  1
       
    Started:   Mon, 21 Dec 2020 16:07:26 +0800
       
    Finished:   Mon, 21 Dec 2020 16:08:48 +0800
      
    Ready:     False
      
    Restart Count: 26
      
    Limits:
       
    cpu:   100m
       
    memory: 128Mi
      
    Requests:
       
    cpu:   100m
       
    memory: 128Mi
      
    Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2
      
    Environment:
       
    PREFECT__CLOUD__AGENT__AUTH_TOKEN:   gNZkGzQJohYgunuMh_okKw
       
    PREFECT__CLOUD__API:          <https://api.prefect.io>
       
    NAMESPACE:               default
       
    IMAGE_PULL_SECRETS:
                  
    PREFECT__CLOUD__AGENT__LABELS:     []
       
    JOB_MEM_REQUEST:
                    
    JOB_MEM_LIMIT:
                     
    JOB_CPU_REQUEST:
                    
    JOB_CPU_LIMIT:
                     
    IMAGE_PULL_POLICY:
                   
    SERVICE_ACCOUNT_NAME:
                 
    PREFECT__BACKEND:           cloud
       
    PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080
      
    Mounts:
       
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-4tqz5 (ro)
    Conditions:
     
    Type       Status
     
    Initialized    True
       
    Ready       False
       
    ContainersReady  False
       
    PodScheduled   True
     
    Volumes:
     
    default-token-4tqz5:
      
    Type:    Secret (a volume populated by a Secret)
      
    SecretName: default-token-4tqz5
      
    Optional:  false
    QoS Class:    Guaranteed
    Node-Selectors: <none>
    Tolerations:   <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
    <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
    Events:
     
    Type   Reason   Age          From   Message
     
    ----   ------   ----         ----   -------
     
    Warning Unhealthy 4m43s (x27 over 94m) kubelet Liveness probe failed: Get <http://10.0.231.193:8080/api/health>: dial tcp 10.0.231.193:8080: connect: connection refused
     
    Warning BackOff  18s (x270 over 90m)  kubelet Back-off restarting failed container