Hi. After upgrading agent to 2.7.1 getting the err...
# prefect-community
v
Hi. After upgrading agent to 2.7.1 getting the error provided in thread.
Copy code
10:28:28.260 | ERROR   | prefect.agent - Failed to submit flow run 'a7e88141-ff19-4c87-90d5-48a82995d75f' to infrastructure.                                                                 │
│ Traceback (most recent call last):                                                                                                                                                           │
│   File "/usr/local/lib/python3.10/site-packages/prefect/agent.py", line 417, in _submit_run_and_capture_errors                                                                               │
│     result = await infrastructure.run(task_status=task_status)                                                                                                                               │
│   File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 277, in run                                                                                      │
│     pid = await run_sync_in_worker_thread(self._get_infrastructure_pid, job)                                                                                                                 │
│   File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 69, in run_sync_in_worker_thread                                                                      │
│     return await anyio.to_thread.run_sync(call, cancellable=True)                                                                                                                            │
│   File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync                                                                                                    │
│     return await get_asynclib().run_sync_in_worker_thread(                                                                                                                                   │
│   File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread                                                                         │
│     return await future                                                                                                                                                                      │
│   File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run                                                                                               │
│     result = context.run(func, *args)                                                                                                                                                        │
│   File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 359, in _get_infrastructure_pid                                                                  │
│     cluster_uid = self._get_cluster_uid()                                                                                                                                                    │
│   File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/kubernetes.py", line 384, in _get_cluster_uid                                                                         │
│     namespace = client.read_namespace("kube-system")                                                                                                                                         │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 22476, in read_namespace                                                                         │
│     return self.read_namespace_with_http_info(name, **kwargs)  # noqa: E501                                                                                                                  │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 22555, in read_namespace_with_http_info                                                          │
│     return self.api_client.call_api(                                                                                                                                                         │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api                                                                                      │
│     return self.__call_api(resource_path, method,                                                                                                                                            │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api                                                                                    │
│     response_data = self.request(                                                                                                                                                            │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 373, in request                                                                                       │
│     return self.rest_client.GET(url,                                                                                                                                                         │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 241, in GET  
│     return self.request("GET", url,                                                                                                                                                          │
│   File "/usr/local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 235, in request                                                                                             │
│     raise ApiException(http_resp=r)                                                                                                                                                          │
│ kubernetes.client.exceptions.ApiException: (403)                                                                                                                                             │
│ Reason: Forbidden                                                                                                                                                                            │
│ HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fd00e1d8-48df-4c8d-8c08-835bc03928f7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options │
│ ': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '58043781-9ba3-433c-8746-28fbf1be655d', 'X-Kubernetes-Pf-Prioritylevel-Uid': '5862da58-0cf2-4368-ba81-8ac98e0a2c38', 'Date': 'Fri, 09 Dec 20 │
│ 22 10:28:28 GMT', 'Content-Length': '357'})                                                                                                                                                  │
│ HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:prefect-agents:pref │
│ ect-agent-dev\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"kube-system","kind":"namespaces"},"code":403}
a
Hey, yes I can confirm this. I assume the issue is this change: https://github.com/PrefectHQ/prefect/pull/7747 Prefect seems to require the prefect k8s service account to have access to the ``kube-system`` namespace.
Question to the prefect team: Is there anything we can do about that? I REALLY don't want to allow my prefect service account to query the kube-system namespace. 😄 For the time being: Can someone provide a service account configuration with all the required permission?
🙌 1
v
@Andreas Nigg Thanks for sharing more details.
a
For anybody experiencing issues with running prefect agents 2.7.1 on kubernetes, it seems we need additional service account permissions to list and get namespaces in kubernetes. We need to create a cluster role which can get and list namespaces. And than bind this role to our prefect service account. The below example should be sufficient. • prefect-orion-agent: Name of my prefect service account (change to the name of your SA) • namespace: prefect: Change to whatever ns your agent is runnning in
Copy code
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRole
metadata:
  name: prefect-ns-watcher
  namespace: prefect
rules:
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["get", "list"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRoleBinding
metadata:
  name: prefect-ns-watcher-role-binding
  namespace: prefect
subjects:
- kind: ServiceAccount
  name: prefect-orion-agent
roleRef:
  kind: ClusterRole
  name: prefect-ns-watcher
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
Disclaimer: Please make sure to check the security implications on your side, before applying the fix. (I allow myself to also post this answer to the channel, as it might help the one or the other)
🙏 2
That being said: I have one agent running in a high-security-environment where there is no chance of getting a ClusterRole created (without me getting promoted 4 times which takes too long 😂 ). Being able to disable this namespace checking for high-security envs like that would be cool.
z
👍 thanks! cc @Peyton Runyan can you coordinate with @Jamie Zieziula to get this documented?
1
@Andreas Nigg if you have any suggestions for a better way to uniquely identify a cluster let us know!
a
@Zanie I'd love to say I have a better solution... but not really. One brute-force option would be to allow setting the namespace (and things prefect currently needs from the cluster-level) via env variable/settings. If these vars are set, prefect takes these values. If they are not set, prefect does what it does now - requiring the clusterrole. Not sure if this makes any sense.
z
Hm I do kind of like a special environment variable
PREFECT_KUBERNETES_CLUSTER_UID=…
a
Sounds fair. Prefect attempts to stop jobs? Or why do you need this info in first place? Just thinking about whether the action itself might have some permission impacts. But I guess that's fine. 🤔
z
If you cancel a run we’ll stop the job. We want this to be possible after agent restarts, so an agent that did not submit the run needs to be able to stop it. But we don’t want to kill a job in the wrong cluster, so we need some scoping.
j
I have the same problem, can’t even create the ClusterRole and ClusterRoleBing because I don’t have the right doing those on the cluster level… I’m stuck on version 2.7.0 for now
a
Hey James, prefect 2.7.2 introduces a new environment variable which allows to set the kubernetes cluster uid (see release notes for the exact env name). If this is set, you don't need the cluster role any more. If this env is not set, prefect falls back to the 2.7.1 behaviour.
j
oh does it mean if i set this env then i don’t need the cluster level rights?
a
Yes that's the plan. I didn't upgrade my prod agents yet, but on a quick staging test I could see that this seems to work now - without the clusterrole
🙏 1
j
that sounds good 👍 just some noob questions tho: 1. where can i get this cluster uid? 2. i set it in the prefect-agent deployment manifest e.g.
prefect kubernetes manifest agent
right?
I managed to be able to answer those questions myself… 1. “cluster uid” is the
metadata.uid
in the
kube-system
namespace manifest 2. yes
🙌 1
👍 1
p
Quick heads up - 2.7.2 has a bug in it that causes an error with custom task names and task names with underscores. 2.7.3 is the same as 2.7.2 but with the fix
🙏 1
a
@Jean-David Fiquet