https://prefect.io logo
Title
v

Vadym Dytyniak

12/09/2022, 10:34 AM
Hi. After upgrading agent to 2.7.1 getting the error provided in thread.
10:28:28.260 | ERROR   | prefect.agent - Failed to submit flow run 'a7e88141-ff19-4c87-90d5-48a82995d75f' to infrastructure.                                                                 │
│ Traceback (most recent call last):                                                                                                                                                           │
│   File "/usr/local/lib/python3.10/site-packages/prefect/agent.py", line 417, in _submit_run_and_capture_errors                                                                               │
│     result = await infrastructure.run(task_status=task_status)                                                                                                                               │
│   File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 277, in run                                                                                      │
│     pid = await run_sync_in_worker_thread(self._get_infrastructure_pid, job)                                                                                                                 │
│   File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 69, in run_sync_in_worker_thread                                                                      │
│     return await anyio.to_thread.run_sync(call, cancellable=True)                                                                                                                            │
│   File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync                                                                                                    │
│     return await get_asynclib().run_sync_in_worker_thread(                                                                                                                                   │
│   File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread                                                                         │
│     return await future                                                                                                                                                                      │
│   File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run                                                                                               │
│     result = context.run(func, *args)                                                                                                                                                        │
│   File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 359, in _get_infrastructure_pid                                                                  │
│     cluster_uid = self._get_cluster_uid()                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 384, in _get_cluster_uid                                                                         │
│     namespace = client.read_namespace("kube-system")                                                                                                                                         │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 22476, in read_namespace                                                                         │
│     return self.read_namespace_with_http_info(name, **kwargs)  # noqa: E501                                                                                                                  │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 22555, in read_namespace_with_http_info                                                          │
│     return self.api_client.call_api(                                                                                                                                                         │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api                                                                                      │
│     return self.__call_api(resource_path, method,                                                                                                                                            │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api                                                                                    │
│     response_data = self.request(                                                                                                                                                            │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 373, in request                                                                                       │
│     return self.rest_client.GET(url,                                                                                                                                                         │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 241, in GET  
│     return self.request("GET", url,                                                                                                                                                          │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 235, in request                                                                                             │
│     raise ApiException(http_resp=r)                                                                                                                                                          │
│ kubernetes.client.exceptions.ApiException: (403)                                                                                                                                             │
│ Reason: Forbidden                                                                                                                                                                            │
│ HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fd00e1d8-48df-4c8d-8c08-835bc03928f7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options │
│ ': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '58043781-9ba3-433c-8746-28fbf1be655d', 'X-Kubernetes-Pf-Prioritylevel-Uid': '5862da58-0cf2-4368-ba81-8ac98e0a2c38', 'Date': 'Fri, 09 Dec 20 │
│ 22 10:28:28 GMT', 'Content-Length': '357'})                                                                                                                                                  │
│ HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:prefect-agents:pref │
│ ect-agent-dev\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"kube-system","kind":"namespaces"},"code":403}
a

Andreas Nigg

12/09/2022, 11:22 AM
Hey, yes I can confirm this. I assume the issue is this change: https://github.com/PrefectHQ/prefect/pull/7747 Prefect seems to require the prefect k8s service account to have access to the ``kube-system`` namespace.
Question to the prefect team: Is there anything we can do about that? I REALLY don't want to allow my prefect service account to query the kube-system namespace. 😄 For the time being: Can someone provide a service account configuration with all the required permission?
🙌 1
v

Vadym Dytyniak

12/09/2022, 11:25 AM
@Andreas Nigg Thanks for sharing more details.
a

Andreas Nigg

12/09/2022, 11:38 AM
For anybody experiencing issues with running prefect agents 2.7.1 on kubernetes, it seems we need additional service account permissions to list and get namespaces in kubernetes. We need to create a cluster role which can get and list namespaces. And than bind this role to our prefect service account. The below example should be sufficient. • prefect-orion-agent: Name of my prefect service account (change to the name of your SA) • namespace: prefect: Change to whatever ns your agent is runnning in
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRole
metadata:
  name: prefect-ns-watcher
  namespace: prefect
rules:
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["get", "list"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRoleBinding
metadata:
  name: prefect-ns-watcher-role-binding
  namespace: prefect
subjects:
- kind: ServiceAccount
  name: prefect-orion-agent
roleRef:
  kind: ClusterRole
  name: prefect-ns-watcher
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
Disclaimer: Please make sure to check the security implications on your side, before applying the fix. (I allow myself to also post this answer to the channel, as it might help the one or the other)
🙏 2
That being said: I have one agent running in a high-security-environment where there is no chance of getting a ClusterRole created (without me getting promoted 4 times which takes too long 😂 ). Being able to disable this namespace checking for high-security envs like that would be cool.
z

Zanie

12/09/2022, 3:42 PM
👍 thanks! cc @Peyton Runyan can you coordinate with @Jamie Zieziula to get this documented?
1
@Andreas Nigg if you have any suggestions for a better way to uniquely identify a cluster let us know!
a

Andreas Nigg

12/09/2022, 8:40 PM
@Zanie I'd love to say I have a better solution... but not really. One brute-force option would be to allow setting the namespace (and things prefect currently needs from the cluster-level) via env variable/settings. If these vars are set, prefect takes these values. If they are not set, prefect does what it does now - requiring the clusterrole. Not sure if this makes any sense.
z

Zanie

12/09/2022, 8:42 PM
Hm I do kind of like a special environment variable
PREFECT_KUBERNETES_CLUSTER_UID=…
a

Andreas Nigg

12/09/2022, 8:49 PM
Sounds fair. Prefect attempts to stop jobs? Or why do you need this info in first place? Just thinking about whether the action itself might have some permission impacts. But I guess that's fine. 🤔
z

Zanie

12/09/2022, 8:52 PM
If you cancel a run we’ll stop the job. We want this to be possible after agent restarts, so an agent that did not submit the run needs to be able to stop it. But we don’t want to kill a job in the wrong cluster, so we need some scoping.
j

James Zhang

12/17/2022, 12:49 PM
I have the same problem, can’t even create the ClusterRole and ClusterRoleBing because I don’t have the right doing those on the cluster level… I’m stuck on version 2.7.0 for now
a

Andreas Nigg

12/17/2022, 12:56 PM
Hey James, prefect 2.7.2 introduces a new environment variable which allows to set the kubernetes cluster uid (see release notes for the exact env name). If this is set, you don't need the cluster role any more. If this env is not set, prefect falls back to the 2.7.1 behaviour.
j

James Zhang

12/17/2022, 12:57 PM
oh does it mean if i set this env then i don’t need the cluster level rights?
a

Andreas Nigg

12/17/2022, 1:09 PM
Yes that's the plan. I didn't upgrade my prod agents yet, but on a quick staging test I could see that this seems to work now - without the clusterrole
🙏 1
j

James Zhang

12/17/2022, 1:12 PM
that sounds good 👍 just some noob questions tho: 1. where can i get this cluster uid? 2. i set it in the prefect-agent deployment manifest e.g.
prefect kubernetes manifest agent
right?
I managed to be able to answer those questions myself… 1. “cluster uid” is the
metadata.uid
in the
kube-system
namespace manifest 2. yes
🙌 1
👍 1
p

Peyton Runyan

12/17/2022, 3:07 PM
Quick heads up - 2.7.2 has a bug in it that causes an error with custom task names and task names with underscores. 2.7.3 is the same as 2.7.2 but with the fix
🙏 1
a

Aleksandr Liadov

01/13/2023, 3:44 PM
@Jean-David Fiquet