Hello again, we've been able to successfully run o...
# prefect-community
b
Hello again, we've been able to successfully run our Dask workers in our kubernetes EKS cluster (by installing a kubernetes agent), and although our prefect cloud logs status displays that the flow has been executed successfully, we still got a log entry at the very end stating:
Copy code
response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"services\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"services"},"code":403}
PS: We can see "sucessful" response bodies, but should we ignore this last entry / is it intended? Thanks in advance.
đź‘€ 1
l
Hi @bruno.corucho! This looks like a EKS log to me at first glance, and implies something is up with your RBAC at some point when kubernetes is trying to do some kubernetes thing. FWIW I haven’t seen this error in our kubernetes runs myself. What environment are you using? And what is the last task runner or flow runner log before this EKS log you see? (so I can triangulate more precisely where we are in the pipeline and see if I can figure out what kubernetes thing kubernetes is trying to do, haha)
b
Hi again @Laura Lorenz (she/her)! The previous log was _(data changed)_:
Copy code
response body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"dask-root-c92b93f5-bw7s8p","generateName":"dask-root-c92b93f5-b12345","namespace":"default","selfLink":"/api/v1/namespaces/default/pods/dask-root-c34hs723-wd231","uid":"0f48a253-a694-4054-bf49-79fc0abcde","resourceVersion":"7332123","creationTimestamp":"2020-07-09T10:23:00Z","deletionTimestamp":"2020-07-09T10:25:50Z","deletionGracePeriodSeconds":30,"labels":{"app":"dask","<http://dask.org/cluster-name|dask.org/cluster-name>":"dask-root-c92b93f5-b","<http://dask.org/component|dask.org/component>":"worker","<http://prefect.io/flow_run_id|prefect.io/flow_run_id>":"9c46ccb0-577b-4658-93fc-ec753cs75","<http://prefect.io/identifier|prefect.io/identifier>":"a666ba1a-60d7-4c4f-8a73-de0efba50edd","user":"root"},"annotations":{"<http://kubernetes.io/psp|kubernetes.io/psp>":"eks.privileged"}},"spec":{"volumes":[{"name":"default-token-jmnsl","secret":{"secretName":"default-token-jmnsl","defaultMode":420}}],"containers":[{"name":"dask-worker","image":"<http://65999014212345.dkr.ecr.eu-west-1.amazonaws.com/strdata-flow:2020-07-09t10-21-22-745632-00-00|65999014212345.dkr.ecr.eu-west-1.amazonaws.com/strdata-flow:2020-07-09t10-21-22-745632-00-00>","args":["dask-worker","--no-bokeh","--death-timeout","60","--name","0"],"env":[{"name":"PREFECT__CLOUD__GRAPHQL","value":"<https://api.prefect.io/graphql>"},{"name":"PREFECT__CLOUD__AUTH_TOKEN","value":"bG2A7CEA123456789"},{"name":"PREFECT__CONTEXT__FLOW_RUN_ID","value":"9c46ccb0-577b-4658-93fc-32cdf652xcd"},{"name":"PREFECT__CLOUD__USE_LOCAL_SECRETS","value":"false"},{"name":"PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS","value":"prefect.engine.cloud.CloudFlowRunner"},{"name":"PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS","value":"prefect.engine.cloud.CloudTaskRunner"},{"name":"PREFECT__ENGINE__EXECUTOR__DEFAULT_CLASS","value":"prefect.engine.executors.DaskExecutor"},{"name":"PREFECT__LOGGING__LOG_TO_CLOUD","value":"true"},{"name":"PREFECT__LOGGING__LEVEL","value":"DEBUG"},{"name":"PREFECT__DEBUG","value":"true"},{"name":"DASK_DISTRIBUTED__SCHEDULER__BLOCKED_HANDLERS","value":"['feed', 'run_function']"},{"name":"PREFECT__LOGGING__EXTRA_LOGGERS","value":"['dask_kubernetes.core', 'distributed.deploy.adaptive', 'kubernetes', 'dask_kubernetes.core', 'distributed.deploy.adaptive', 'kubernetes']"},{"name":"DASK_SCHEDULER_ADDRESS","value":"<tcp://10.12.34.56.78:12345>"}],"resources":{"limits":{"cpu":"500m"},"requests":{"cpu":"500m"}},"volumeMounts":[{"name":"default-token-jmnsl","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Never","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"ip-10.12.34.56.78.eu-west-1.compute.internal","securityContext":{},"affinity":{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"weight":100,"preference":{"matchExpressions":[{"key":"<http://k8s.dask.org/node-purpose|k8s.dask.org/node-purpose>","operator":"In","values":["worker"]}]}}]}},"schedulerName":"default-scheduler","tolerations":[{"key":"<http://k8s.dask.org/dedicated|k8s.dask.org/dedicated>","operator":"Equal","value":"worker","effect":"NoSchedule"},{"key":"k8s.dask.org_dedicated","operator":"Equal","value":"worker","effect":"NoSchedule"},{"key":"<http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"<http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true},"status":{"phase":"Running","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2020-07-09T10:23:00Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2020-07-09T10:23:02Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2020-07-09T10:23:02Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2020-07-09T10:23:00Z"}],"hostIP":"10.12.34.56.78","podIP":"10.12.34.56.78","podIPs":[{"ip":"10.12.34.56.78"}],"startTime":"2020-07-09T10:23:00Z","containerStatuses":[{"name":"dask-worker","state":{"running":{"startedAt":"2020-07-09T10:23:01Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"<http://659990142216.dkr.ecr.eu-west-1.amazonaws.com/strdata-flow:2020-07-09t10-21-22-745632-00-00|659990142216.dkr.ecr.eu-west-1.amazonaws.com/strdata-flow:2020-07-09t10-21-22-745632-00-00>","imageID":"<docker-pullable://123456789.dkr.ecr.eu-west-1.amazonaws.com/strdata-flow@sha256:e1ac7a8fd8acb6173e1c3d9617e0da4494c1d885>","containerID":"<docker://0af5315fda9959bdfb267cb4bbc9b742805da6b4c347479da8579>","started":true}],"qosClass":"Burstable"}}
Copy code
Deleted pod: dask-root-c92b93f5-bw7s8p (pod above)
then
Copy code
response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"services\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"services"},"code":403}
We're using Dask Kubernetes Environment (very much alike the orchestation presentation you did! but on aws)
If you mean flow info, the last messages I got were:
Copy code
Task 'store_stream[12]': finished task run for task with final state: 'Success'

Flow run SUCCESS: all reference tasks succeeded
Idk if this answers your requests ?
l
Ok gotcha. My guess is that something during or after pod cleanup on EKS requires this list services permission, which it thinks it doesn’t have. Let me double check on what our standard RBAC is from
prefect agent install kubernetes
Ok, my guess is that you should add the permission to list services in the RBAC for your agent (I’m guessing you used the default one from https://docs.prefect.io/orchestration/agents/kubernetes.html#requirements, so you would need to edit this or otherwise create a new file with a role and rolebinding with this permission and apply it to your kubernetes cluster’s default namespace.) We test mostly on GCP so my suspicion is that EKS permission requirements are slightly different
b
@Laura Lorenz (she/her) aaand you were right! I updated it and I'm no longer having such a problem. Thank you Laura!
l
Awesome! 🙂