Sagun Garg
12/21/2020, 7:27 AM$ prefect agent kubernetes install -t <MY_TOKEN> --rbac | kubectl apply -f -
Warning Unhealthy kubelet Liveness probe failed :8080/api/health
Logs of the failing pod made to restart by K8s
Warning LoggingDisabled 48m fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 48m fargate-scheduler Successfully assigned default/prefect-agent-7bcdb4f975-pftnw to fargate-ip-10-0-231-193.ap-southeast-1.compute.internal
Normal Created 43m (x4 over 47m) kubelet Created container agent
Normal Started 43m (x4 over 47m) kubelet Started container agent
Normal Killing 42m (x2 over 45m) kubelet Container agent failed liveness probe, will be restarted
Normal Pulling 41m (x5 over 48m) kubelet Pulling image "prefecthq/prefect:0.14.0-python3.6"
Normal Pulled 41m (x5 over 47m) kubelet Successfully pulled image "prefecthq/prefect:0.14.0-python3.6"
Warning Unhealthy 12m (x13 over 46m) kubelet Liveness probe failed: Get <http://10.0.231.193:8080/api/health>: dial tcp 10.0.231.193:8080: connect: connection refused
Warning BackOff 3m3s (x113 over 43m) kubelet Back-off restarting failed container
Name: prefect-agent-7bcdb4f975-pftnw
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: fargate-ip-10-0-231-193.ap-southeast-1.compute.internal/10.0.231.193
Start Time: Mon, 21 Dec 2020 14:37:38 +0800
Labels: app=prefect-agent
<http://eks.amazonaws.com/fargate-profile=fp-default|eks.amazonaws.com/fargate-profile=fp-default>
pod-template-hash=7bcdb4f975
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status: Running
IP: 10.0.231.193
IPs:
IP: 10.0.231.193
Controlled By: ReplicaSet/prefect-agent-7bcdb4f975
Containers:
agent:
Container ID: <containerd://9becbcb74d4ebc66c0c93a0fd40f5e3a15ee0c276831457c1514c752188b5c2>1
Image: prefecthq/prefect:0.14.0-python3.6
Image ID: <http://docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0|docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
prefect agent kubernetes start
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 21 Dec 2020 16:07:26 +0800
Finished: Mon, 21 Dec 2020 16:08:48 +0800
Ready: False
Restart Count: 26
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2
Environment:
PREFECT__CLOUD__AGENT__AUTH_TOKEN: gNZkGzQJohYgunuMh_okKw
PREFECT__CLOUD__API: <https://api.prefect.io>
NAMESPACE: default
IMAGE_PULL_SECRETS:
PREFECT__CLOUD__AGENT__LABELS: []
JOB_MEM_REQUEST:
JOB_MEM_LIMIT:
JOB_CPU_REQUEST:
JOB_CPU_LIMIT:
IMAGE_PULL_POLICY:
SERVICE_ACCOUNT_NAME:
PREFECT__BACKEND: cloud
PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4tqz5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-4tqz5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4tqz5
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m43s (x27 over 94m) kubelet Liveness probe failed: Get <http://10.0.231.193:8080/api/health>: dial tcp 10.0.231.193:8080: connect: connection refused
Warning BackOff 18s (x270 over 90m) kubelet Back-off restarting failed container
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/local/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 353, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fbb61ef5fd0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
**response_kw
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
**response_kw
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
**response_kw
[Previous line repeated 3 more times]
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 573, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<http://api.prefect.io|api.prefect.io>', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fbb61ef5fd0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/prefect", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/prefect/cli/agent.py", line 282, in start
start_agent(KubernetesAgent, image_pull_secrets=image_pull_secrets, **kwargs)
File "/usr/local/lib/python3.6/site-packages/prefect/cli/agent.py", line 109, in start_agent
agent.start()
File "/usr/local/lib/python3.6/site-packages/prefect/agent/agent.py", line 221, in start
self._verify_token(self.client.get_auth_token())
File "/usr/local/lib/python3.6/site-packages/prefect/agent/agent.py", line 175, in _verify_token
result = self.client.graphql(query="query { auth_info { api_token_scope } }")
File "/usr/local/lib/python3.6/site-packages/prefect/client/client.py", line 303, in graphql
retry_on_api_error=retry_on_api_error,
File "/usr/local/lib/python3.6/site-packages/prefect/client/client.py", line 219, in post
retry_on_api_error=retry_on_api_error,
File "/usr/local/lib/python3.6/site-packages/prefect/client/client.py", line 445, in _request
session=session, method=method, url=url, params=params, headers=headers
File "/usr/local/lib/python3.6/site-packages/prefect/client/client.py", line 345, in _send_request
response = <http://session.post|session.post>(url, headers=headers, json=params, timeout=30)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 590, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='<http://api.prefect.io|api.prefect.io>', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fbb61ef5fd0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
Kyle Moon-Wright
12/21/2020, 6:29 PMPedro Martins
12/21/2020, 6:53 PMprefect backend cloud
.- name: PREFECT__CLOUD__API
value: <https://api.prefect.io>
Sagun Garg
12/22/2020, 7:16 AMapiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: prefect-agent
name: prefect-agent
spec:
replicas: 1
selector:
matchLabels:
app: prefect-agent
template:
metadata:
labels:
app: prefect-agent
spec:
containers:
- args:
- prefect agent kubernetes start
command:
- /bin/bash
- -c
env:
- name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
value: kXe_*#%#%#^$^$^$%^BdgULWw
- name: PREFECT__CLOUD__API
value: <https://api.prefect.io>
- name: NAMESPACE
value: default
- name: IMAGE_PULL_SECRETS
value: ''
- name: PREFECT__CLOUD__AGENT__LABELS
value: '[]'
- name: JOB_MEM_REQUEST
value: ''
- name: JOB_MEM_LIMIT
value: ''
- name: JOB_CPU_REQUEST
value: ''
- name: JOB_CPU_LIMIT
value: ''
- name: IMAGE_PULL_POLICY
value: ''
- name: SERVICE_ACCOUNT_NAME
value: ''
- name: PREFECT__BACKEND
value: cloud
- name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
value: <http://0.0.0.0:8080>
image: prefecthq/prefect:0.14.0-python3.6
imagePullPolicy: Always
livenessProbe:
failureThreshold: 2
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 40
periodSeconds: 40
name: agent
resources:
limits:
cpu: 100m
memory: 128Mi
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
name: prefect-agent-rbac
namespace: default
rules:
- apiGroups:
- batch
- extensions
resources:
- jobs
verbs:
- '*'
- apiGroups:
- ''
resources:
- events
- pods
verbs:
- '*'
---
apiVersion: <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>
kind: RoleBinding
metadata:
name: prefect-agent-rbac
namespace: default
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: Role
name: prefect-agent-rbac
subjects:
- kind: ServiceAccount
name: default
Pedro Martins
12/22/2020, 12:39 PMhttp://:8080
. Try modifying and let me know the results.
Also, post the description of the pod that is crashing.Sagun Garg
12/22/2020, 2:19 PMhttp://:8080
kubectl describe pod prefect-agent-f4c6f6545-jsl8t
Name: prefect-agent-f4c6f6545-jsl8t
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: fargate-ip-10-0-241-95.ap-southeast-1.compute.internal/10.0.241.95
Start Time: Tue, 22 Dec 2020 22:15:22 +0800
Labels: app=prefect-agent
<http://eks.amazonaws.com/fargate-profile=fp-default|eks.amazonaws.com/fargate-profile=fp-default>
pod-template-hash=f4c6f6545
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status: Running
IP: 10.0.241.95
IPs:
IP: 10.0.241.95
Controlled By: ReplicaSet/prefect-agent-f4c6f6545
Containers:
agent:
Container ID: <containerd://4908f35c04f6348bba698aa1aaa0c83b15fbaef8782382a1b14db65cfcf859c>7
Image: prefecthq/prefect:0.14.0-python3.6
Image ID: <http://docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0|docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
prefect agent kubernetes start
State: Running
Started: Tue, 22 Dec 2020 22:15:41 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2
Environment:
PREFECT__CLOUD__AGENT__AUTH_TOKEN: kXe_E5u6IS2d7AUBdgULWw
PREFECT__CLOUD__API: <https://api.prefect.io>
NAMESPACE: default
IMAGE_PULL_SECRETS:
PREFECT__CLOUD__AGENT__LABELS: []
JOB_MEM_REQUEST:
JOB_MEM_LIMIT:
JOB_CPU_REQUEST:
JOB_CPU_LIMIT:
IMAGE_PULL_POLICY:
SERVICE_ACCOUNT_NAME:
PREFECT__BACKEND: cloud
PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4tqz5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-4tqz5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4tqz5
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 101s fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 68s fargate-scheduler Successfully assigned default/prefect-agent-f4c6f6545-jsl8t to fargate-ip-10-0-241-95.ap-southeast-1.compute.internal
Normal Pulling 68s kubelet Pulling image "prefecthq/prefect:0.14.0-python3.6"
Normal Pulled 50s kubelet Successfully pulled image "prefecthq/prefect:0.14.0-python3.6"
Normal Created 49s kubelet Created container agent
Normal Started 49s kubelet Started container agent
(.venv) Saguns-MacBook-Pro:ingress sagungargs$ kubectl get pods
NAME READY STATUS RESTARTS AGE
prefect-agent-f4c6f6545-jsl8t 1/1 Running 0 119s
(.venv) Saguns-MacBook-Pro:ingress sagungargs$ kubectl get pods
NAME READY STATUS RESTARTS AGE
prefect-agent-f4c6f6545-jsl8t 1/1 Running 0 2m9s
(.venv) Saguns-MacBook-Pro:ingress sagungargs$ kubectl describe pod prefect-agent-f4c6f6545-jsl8t
Name: prefect-agent-f4c6f6545-jsl8t
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: fargate-ip-10-0-241-95.ap-southeast-1.compute.internal/10.0.241.95
Start Time: Tue, 22 Dec 2020 22:15:22 +0800
Labels: app=prefect-agent
<http://eks.amazonaws.com/fargate-profile=fp-default|eks.amazonaws.com/fargate-profile=fp-default>
pod-template-hash=f4c6f6545
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status: Running
IP: 10.0.241.95
IPs:
IP: 10.0.241.95
Controlled By: ReplicaSet/prefect-agent-f4c6f6545
Containers:
agent:
Container ID: <containerd://4908f35c04f6348bba698aa1aaa0c83b15fbaef8782382a1b14db65cfcf859c>7
Image: prefecthq/prefect:0.14.0-python3.6
Image ID: <http://docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0|docker.io/prefecthq/prefect@sha256:3ebe46f840d46044b9521c9380aa13bd0755670d06e2fdfe0e23c69de5a78fc0>
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
prefect agent kubernetes start
State: Running
Started: Tue, 22 Dec 2020 22:15:41 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8080/api/health delay=40s timeout=1s period=40s #success=1 #failure=2
Environment:
PREFECT__CLOUD__AGENT__AUTH_TOKEN: kXe_E5u6IS2d7AUBdgULWw
PREFECT__CLOUD__API: <https://api.prefect.io>
NAMESPACE: default
IMAGE_PULL_SECRETS:
PREFECT__CLOUD__AGENT__LABELS: []
JOB_MEM_REQUEST:
JOB_MEM_LIMIT:
JOB_CPU_REQUEST:
JOB_CPU_LIMIT:
IMAGE_PULL_POLICY:
SERVICE_ACCOUNT_NAME:
PREFECT__BACKEND: cloud
PREFECT__CLOUD__AGENT__AGENT_ADDRESS: http://:8080
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4tqz5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-4tqz5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4tqz5
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 2m14s fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 101s fargate-scheduler Successfully assigned default/prefect-agent-f4c6f6545-jsl8t to fargate-ip-10-0-241-95.ap-southeast-1.compute.internal
Normal Pulling 101s kubelet Pulling image "prefecthq/prefect:0.14.0-python3.6"
Normal Pulled 83s kubelet Successfully pulled image "prefecthq/prefect:0.14.0-python3.6"
Normal Created 82s kubelet Created container agent
Normal Started 82s kubelet Started container agent
Warning Unhealthy 3s kubelet Liveness probe failed: Get <http://10.0.241.95:8080/api/health>: dial tcp 10.0.241.95:8080: connect: connection refused