https://prefect.io logo
m

Menekse Tok

07/11/2023, 6:02 PM
Hello,Trying to run a flow from a deployment but it goes into "Late" state and then work queue goes into unhealthy state. Any suggestion how to troubleshoot this issue? Deployment has storage block as GiHub and Infra block as Kubernetes Job. My prefect agent is running on AWS EKS cluster.
c

Christopher Boyd

07/11/2023, 6:06 PM
Late means that it hasnโ€™t been picked up by the agent
Iโ€™d start looking at the agent logs
Is the deployment pointing to the right work-pool / queue
m

Menekse Tok

07/11/2023, 6:39 PM
from prefect.filesystems import GitHub
github_block = GitHub.load("my-git-repo") from prefect.infrastructure.kubernetes import KubernetesJob kubernetes_job_block = KubernetesJob.load("k8s") from dog_flow import dog from prefect.deployments import Deployment deployment = Deployment.build_from_flow( flow=dog, name= "dog-deploy-from-python", parameters= {"num_barks": 7}, storage= github_block, infrastructure= kubernetes_job_block, ) deployment.apply()
I am using default agent and work queue
What would be the best way to see agent logs since it is running inside the cluster? @Christopher Boyd
n

Nate

07/11/2023, 6:51 PM
you can switch to your EKS context where your agent is running
Copy code
$ kubectl get pods | grep agent
 prefect-agent-podxyz

$ kubectl logs prefect-agent-podxyz > somefile.txt
๐Ÿ‘ 1
m

Menekse Tok

07/11/2023, 7:13 PM
Now, I see this error from Flow Runs Details :
Submission failed. ApiException: (401) Reason: Unauthorized HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b9fcbcb5-47c7-4d18-83a7-eea26175277b', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 11 Jul 2023 18:43:54 GMT', 'Content-Length': '129'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
@Nate
c

Christopher Boyd

07/11/2023, 7:36 PM
Your API key is wrong
Can you verify the new api key works for your api url? And that those are the same ones matching for your agent in k8s as well (based on the env variables / secret)? A 401 is very specifically that key is not valid for that url - it could be that you are using a different account id or workspace ID with a valid API key, but I think thatโ€™s a 403 not a 401
m

Menekse Tok

07/11/2023, 8:42 PM
Yes, I did verify. Response:
Returned status code: 200 Valid API Key and URL
Now, I am running the flow again and It appears in Late state.
c

Christopher Boyd

07/11/2023, 9:01 PM
what do the agent logs show separately from running a flow ? They should show something successfully connected to your api url and listening for work - does that appear ?
If the agent is OK , and polling successfully, itโ€™s possible that the service account you are using is the issue on the cluster
The api exception could be getting returned by the cluster api and failing to submit a job in the cluster
m

Menekse Tok

07/12/2023, 12:15 AM
~ on โ˜๏ธ  (us-east-1)
โฏ kubectl cluster-info Kubernetes control plane is running at https://8C633A2B6C37F512CD23E113E2BC6855.gr7.us-east-1.eks.amazonaws.com CoreDNS is running at https://8C633A2B6C37F512CD23E113E2BC6855.gr7.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. ~ on โ˜๏ธ (us-east-1) โฏ kubectl get pods -n prefect NAME READY STATUS RESTARTS AGE prefect-agent-7c6fc454b4-mz8pm 0/1 CrashLoopBackOff 25 (2m15s ago) 105m ~ on โ˜๏ธ (us-east-1) โฏ kubectl describe pod prefect-agent-7c6fc454b4-mz8pm -n prefect Name: prefect-agent-7c6fc454b4-mz8pm Namespace: prefect Priority: 0 Service Account: prefect-agent Node: ip-10-212-26-238.ec2.internal/10.212.26.238 Start Time: Tue, 11 Jul 2023 180548 -0400 Labels: app.kubernetes.io/component=agent app.kubernetes.io/instance=prefect-agent app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=prefect-agent helm.sh/chart=prefect-agent-2023.06.29 pod-template-hash=7c6fc454b4 prefect-version=2.10.18-python3.10 Annotations: <none> Status: Running IP: 10.212.24.230 IPs: IP: 10.212.24.230 Controlled By: ReplicaSet/prefect-agent-7c6fc454b4 Containers: prefect-agent: Container ID: containerd://3bb545f229acbb0d13de760d219da2a27eb2d0aa31d05a5e23124a17d8d984c3 Image: prefecthq/prefect:2.10.18-python3.10 Image ID: docker.io/prefecthq/prefect@sha256:a18bd1c34326d954a7c829722dca856427506c6c193ae56fbe3ed8a49d680bd3 Port: <none> Host Port: <none> Command: /usr/bin/tini -g -- /opt/prefect/entrypoint.sh Args: prefect agent start --work-queue default --limit None State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Tue, 11 Jul 2023 194908 -0400 Finished: Tue, 11 Jul 2023 194910 -0400 Ready: False Restart Count: 25 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 256Mi Environment: HOME: /home/prefect PREFECT_AGENT_PREFETCH_SECONDS: 10 PREFECT_AGENT_QUERY_INTERVAL: 5 PREFECT_API_ENABLE_HTTP2: true PREFECT_API_URL: https://api.prefect.cloud/api/accounts/d4f379a8-953a-418b-b0c4-55288e594884/workspaces/36bbbb1c-f96e-41e5-ace7-3efbab3babd2 PREFECT_KUBERNETES_CLUSTER_UID: e45ddd7d-c95c-4603-85bf-68818f44f603 PREFECT_API_KEY: <set to the key 'key' in secret 'prefect-api-key'> Optional: false PREFECT_DEBUG_MODE: false Mounts: /home/prefect from scratch (rw) /tmp from scratch (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2npfj (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: scratch: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> kube-api-access-2npfj: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning BackOff 72s (x484 over 105m) kubelet Back-off restarting failed container ~ on โ˜๏ธ (us-east-1) โฏ kubectl logs prefect-agent-7c6fc454b4-mz8pm -n prefect -p Usage: prefect agent start [OPTIONS] [WORK_QUEUE] Try 'prefect agent start --help' for help. โ•ญโ”€ Error โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ Invalid value for '-l' / '--limit': 'None' is not a valid integer. โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
n

Nate

07/12/2023, 12:17 AM
i think your limit passed to the agent start command wants to be integer, looks like None
upvote 1
m

Menekse Tok

07/12/2023, 12:20 AM
How can I resolve this problem?
c

Christopher Boyd

07/12/2023, 12:21 AM
Your args has literally:
Copy code
Args:
      prefect
      agent
      start
      --work-queue
      default
      --limit
      None
Is
--limit None
something you need?
m

Menekse Tok

07/12/2023, 12:26 AM
I am using this recipe with few changes. This issue was fixed. Would not that solve my problem as well?
n

Nate

07/12/2023, 12:31 AM
thanks for linking that! if youโ€™re just setting up id recommend this guide on setting up workers on k8s via helm, in my view its the easiest to get started quickly and eventually the most configurable if you need it otherwise i think just removing the limit flag entirely might get you what you want
catjam 1
c

Christopher Boyd

07/12/2023, 12:32 AM
As an immediate short term workaround you can just edit the deployment with kubectl and remove the โ€”limit None
upvote 1
Long term - I havenโ€™t used that recipe yet so Iโ€™m not sure if itโ€™s fixed or needs a fix still