Has anyone had success registering the Prefect Age...
# prefect-community
s
Has anyone had success registering the Prefect Agent on a Private Subnet on AWS EKS? We can get the Pod running, and will register with Prefect Cloud to pick up pending Flows just fine. The challenge we're running into, is that the
agent
container is failing liveness probes, and killing the Pod.
Copy code
Warning Unhealthy 43s (x2 over 83s) kubelet, .internal Liveness probe failed: Get <http://10.0.28.4:8080/api/health> : dial tcp 10.0.28.4:8080: connect: connection refused
I'm chatting with AWS right now, and they're asserting that because kubelet is attempting to connect on port 8080, that the Deployment YAML for
agent
should be exposing that. But I don't have this problem when deploying the same YAML on GCP, OpenShift, or bare metal K8.
j
I have not run into this before but have you tried updating the port in the deployment yaml to see if it takes?
Copy code
- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
  value: "http://:8080"
Don’t know enough for certain as to why EKS is talking to that port and other k8s providers aren’t
s
The latest theory is that it appears the liveness probe is listening on only the localhost:
Copy code
netstat -antlp | grep 8080
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      1/python
So perhaps the other K8 deployments are issuing the liveness probe to
localhost
while AWS' EKS uses their VPC subnets for Pods and is attempting to access the actual IP of the Pod at port 8080 .. and because the service doesn't appear to be listening on all addresses, it's rejected
I'm on prefect version 0.12.6 ... is there a way to update
PREFECT__CLOUD__AGENT__AGENT_ADDRESS
with the IP from the Pod?
j
Hmm not one directly but using k8s you may be able to achieve it by setting something like:
Copy code
- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
(from https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/) I also wonder if you can set the liveness probe host directly:
Copy code
livenessProbe:
            httpGet:
              host: pod_ip_here
              path: /api/health
              port: 8080
            initialDelaySeconds: 40
            periodSeconds: 40
            failureThreshold: 2
s
Thanks! I looked at
PREFECT__CLOUD__AGENT__AGENT_ADDRESS
on the other K8 clusters, and it seems they were set to
http://:8080
which appears to listen on all addresses of the Pod. For whatever reason, when I ran
prefect agent install kubernetes ...
for the AWS deployment, it changed that value to
<http://127.0.0.1:8080>
. Manually changing this to
<http://0.0.0.0:8080>
allowed it to pass the liveness probes!!
s
@Scott Zelenka How can I manually change these values for
PREFECT__CLOUD__AGENT__AGENT_ADDRESS
can you please share the commands you used on the CLI to make this happen in the yaml file?
@Scott Zelenka Here is what I get when I describe my pod
Name:         prefect-agent-7bcdb4f975-pftnw
Namespace:      default
Priority:       2000001000
Priority Class Name: system-node-critical
Node:         fargate-ip-10-0-231-193.ap-southeast-1.compute.internal/10.0.231.193
Start Time:      Mon, 21 Dec 2020 14:37:38 +0800
Labels:        app=prefect-agent
<http://eks.amazonaws.com/fargate-profile=fp-default|eks.amazonaws.com/fargate-profile=fp-default>
           
pod-template-hash=7bcdb4f975
Annotations:     CapacityProvisioned: 0.25vCPU 0.5GB
           
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:        Running
IP:          10.0.231.193
IPs:
 
IP:      10.0.231.193
Controlled By: ReplicaSet/prefect-agent-7bcdb4f975
Containers:
 
agent:
  
Container ID: <containerd://9becbcb74d4ebc66c0c93a0fd40f5e3a15ee0c276831457c1514c752188b5c2>1
  
Image:     prefecthq/prefect:0.14.0-python3.6
  
Image ID:   <http://docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0|docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
  
Port:     <none>
  
Host Port:   <none>
  
Command:
   
/bin/bash
   
-c
  
Args:
   
prefect agent kubernetes start
  
State:     Waiting
   
Reason:    CrashLoopBackOff
  
Last State:   Terminated
   
Reason:    Error
   
Exit Code:  1
   
Started:   Mon, 21 Dec 2020 16:07:26 +0800
   
Finished:   Mon, 21 Dec 2020 16:08:48 +0800
  
Ready:     False
  
Restart Count: 26
  
Limits:
   
cpu:   100m
   
memory: 128Mi
  
Requests:
   
cpu:   100m
   
memory: 128Mi
  
Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2
  
Environment:
   
PREFECT__CLOUD__AGENT__AUTH_TOKEN:   gNZkGzQJohYgunuMh_okKw
   
PREFECT__CLOUD__API:          <https://api.prefect.io>
   
NAMESPACE:               default
   
IMAGE_PULL_SECRETS:
              
PREFECT__CLOUD__AGENT__LABELS:     []
   
JOB_MEM_REQUEST:
                
JOB_MEM_LIMIT:
                 
JOB_CPU_REQUEST:
                
JOB_CPU_LIMIT:
                 
IMAGE_PULL_POLICY:
               
SERVICE_ACCOUNT_NAME:
             
PREFECT__BACKEND:           cloud
   
PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080
  
Mounts:
   
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4tqz5 (ro)
Conditions:
 
Type       Status
 
Initialized    True
   
Ready       False
   
ContainersReady  False
   
PodScheduled   True
 
Volumes:
 
default-token-4tqz5:
  
Type:    Secret (a volume populated by a Secret)
  
SecretName: default-token-4tqz5
  
Optional:  false
QoS Class:    Guaranteed
Node-Selectors: <none>
Tolerations:   <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
 
Type   Reason   Age          From   Message
 
----   ------   ----         ----   -------
 
Warning Unhealthy 4m43s (x27 over 94m) kubelet Liveness probe failed: Get <http://10.0.231.193:8080/api/health>: dial tcp 10.0.231.193:8080: connect: connection refused
 
Warning BackOff  18s (x270 over 90m)  kubelet Back-off restarting failed container