Has anyone had success registering the Prefect Agent on a Pr Prefect Community #ask-community

Has anyone had success registering the Prefect Age...

Scott Zelenka

07/29/2020, 3:31 PM

Has anyone had success registering the Prefect Agent on a Private Subnet on AWS EKS? We can get the Pod running, and will register with Prefect Cloud to pick up pending Flows just fine. The challenge we're running into, is that the

agent

container is failing liveness probes, and killing the Pod.

Copy code

Warning Unhealthy 43s (x2 over 83s) kubelet, .internal Liveness probe failed: Get <http://10.0.28.4:8080/api/health> : dial tcp 10.0.28.4:8080: connect: connection refused

I'm chatting with AWS right now, and they're asserting that because kubelet is attempting to connect on port 8080, that the Deployment YAML for

agent

should be exposing that. But I don't have this problem when deploying the same YAML on GCP, OpenShift, or bare metal K8.

josh

07/29/2020, 3:38 PM

I have not run into this before but have you tried updating the port in the deployment yaml to see if it takes?

Copy code

- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
  value: "http://:8080"

josh

07/29/2020, 3:38 PM

Don’t know enough for certain as to why EKS is talking to that port and other k8s providers aren’t

Scott Zelenka

07/29/2020, 3:46 PM

The latest theory is that it appears the liveness probe is listening on only the localhost:

Copy code

netstat -antlp | grep 8080
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      1/python

So perhaps the other K8 deployments are issuing the liveness probe to

localhost

while AWS' EKS uses their VPC subnets for Pods and is attempting to access the actual IP of the Pod at port 8080 .. and because the service doesn't appear to be listening on all addresses, it's rejected

Scott Zelenka

07/29/2020, 3:48 PM

I'm on prefect version 0.12.6 ... is there a way to update

PREFECT__CLOUD__AGENT__AGENT_ADDRESS

with the IP from the Pod?

josh

07/29/2020, 3:50 PM

Hmm not one directly but using k8s you may be able to achieve it by setting something like:

Copy code

- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.podIP

(from https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/) I also wonder if you can set the liveness probe host directly:

Copy code

livenessProbe:
            httpGet:
              host: pod_ip_here
              path: /api/health
              port: 8080
            initialDelaySeconds: 40
            periodSeconds: 40
            failureThreshold: 2

Scott Zelenka

07/29/2020, 4:05 PM

Thanks! I looked at

PREFECT__CLOUD__AGENT__AGENT_ADDRESS

on the other K8 clusters, and it seems they were set to

http://:8080

which appears to listen on all addresses of the Pod. For whatever reason, when I ran

prefect agent install kubernetes ...

for the AWS deployment, it changed that value to

<http://127.0.0.1:8080>

. Manually changing this to

<http://0.0.0.0:8080>

allowed it to pass the liveness probes!!

Sagun Garg

12/21/2020, 7:47 AM

@Scott Zelenka How can I manually change these values for

PREFECT__CLOUD__AGENT__AGENT_ADDRESS

can you please share the commands you used on the CLI to make this happen in the yaml file?

Sagun Garg

12/21/2020, 8:14 AM

@Scott Zelenka Here is what I get when I describe my pod

Name:         prefect-agent-7bcdb4f975-pftnw

Namespace:      default

Priority:       2000001000

Priority Class Name: system-node-critical

Node:         fargate-ip-10-0-231-193.ap-southeast-1.compute.internal/10.0.231.193

Start Time:      Mon, 21 Dec 2020 14:37:38 +0800

Labels:        app=prefect-agent

<http://eks.amazonaws.com/fargate-profile=fp-default|eks.amazonaws.com/fargate-profile=fp-default>

pod-template-hash=7bcdb4f975

Annotations:     CapacityProvisioned: 0.25vCPU 0.5GB

Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND

<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged

Status:        Running

IP:          10.0.231.193

IPs:

IP:      10.0.231.193

Controlled By: ReplicaSet/prefect-agent-7bcdb4f975

Containers:

agent:

Container ID: <containerd://9becbcb74d4ebc66c0c93a0fd40f5e3a15ee0c276831457c1514c752188b5c2>1

Image:     prefecthq/prefect:0.14.0-python3.6

Image ID:   <http://docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0|docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>

Port:     <none>

Host Port:   <none>

Command:

/bin/bash

-c

Args:

prefect agent kubernetes start

State:     Waiting

Reason:    CrashLoopBackOff

Last State:   Terminated

Reason:    Error

Exit Code:  1

Started:   Mon, 21 Dec 2020 16:07:26 +0800

Finished:   Mon, 21 Dec 2020 16:08:48 +0800

Ready:     False

Restart Count: 26

Limits:

cpu:   100m

memory: 128Mi

Requests:

cpu:   100m

memory: 128Mi

Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2

Environment:

PREFECT__CLOUD__AGENT__AUTH_TOKEN:   gNZkGzQJohYgunuMh_okKw

PREFECT__CLOUD__API:          <https://api.prefect.io>

NAMESPACE:               default

IMAGE_PULL_SECRETS:

PREFECT__CLOUD__AGENT__LABELS:     []

JOB_MEM_REQUEST:

JOB_MEM_LIMIT:

JOB_CPU_REQUEST:

JOB_CPU_LIMIT:

IMAGE_PULL_POLICY:

SERVICE_ACCOUNT_NAME:

PREFECT__BACKEND:           cloud

PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-4tqz5 (ro)

Conditions:

Type       Status

Initialized    True

Ready       False

ContainersReady  False

PodScheduled   True

Volumes:

default-token-4tqz5:

Type:    Secret (a volume populated by a Secret)

SecretName: default-token-4tqz5

Optional:  false

QoS Class:    Guaranteed

Node-Selectors: <none>

Tolerations:   <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s

<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s

Events:

Type   Reason   Age          From   Message

----   ------   ----         ----   -------

Warning Unhealthy 4m43s (x27 over 94m) kubelet Liveness probe failed: Get <http://10.0.231.193:8080/api/health>: dial tcp 10.0.231.193:8080: connect: connection refused

Warning BackOff  18s (x270 over 90m)  kubelet Back-off restarting failed container

5 Views

Open in Slack

Previous Next