Hi all I am currently trying to get k8s agent work...
# prefect-community
j
Hi all I am currently trying to get k8s agent working on gcp autopilot cluster - have run into the following error;
Copy code
jobs.batch is forbidden: User \"system:serviceaccount:default:default\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"prefect\"
The agent deployment I used is the following;
Copy code
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prefect-agent
  name: prefect-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prefect-agent
  template:
    metadata:
      labels:
        app: prefect-agent
    spec:
      containers:
      - args:
        - prefect agent kubernetes start
        command:
        - /bin/bash
        - -c
        env:
        - name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
          value: <MY_KEY>
        - name: PREFECT__CLOUD__API
          value: <https://api.prefect.io>
        - name: NAMESPACE
          value: prefect
        - name: IMAGE_PULL_SECRETS
          value: ''
        - name: PREFECT__CLOUD__AGENT__LABELS
          value: '[''test'']'
        - name: JOB_MEM_REQUEST
          value: ''
        - name: JOB_MEM_LIMIT
          value: ''
        - name: JOB_CPU_REQUEST
          value: ''
        - name: JOB_CPU_LIMIT
          value: ''
        - name: IMAGE_PULL_POLICY
          value: ''
        - name: SERVICE_ACCOUNT_NAME
          value: ''
        - name: PREFECT__BACKEND
          value: cloud
        - name: PREFECT__CLOUD__AGENT__AGENT_ADDRESS
          value: http://:8080
        - name: PREFECT__CLOUD__API_KEY
          value: <MY_KEY>
        - name: PREFECT__CLOUD__TENANT_ID
          value: ''
        image: prefecthq/prefect:1.1.0-python3.7
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          httpGet:
            path: /api/health
            port: 8080
          initialDelaySeconds: 40
          periodSeconds: 40
        name: agent
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  name: prefect-agent-rbac
  namespace: prefect
rules:
- apiGroups:
  - batch
  - extensions
  resources:
  - jobs
  verbs:
  - '*'
- apiGroups:
  - ''
  resources:
  - events
  - pods
  verbs:
  - '*'
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
  name: prefect-agent-rbac
  namespace: prefect
roleRef:
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
  kind: Role
  name: prefect-agent-rbac
subjects:
- kind: ServiceAccount
  name: default
I think its probably a namespace issue - agent is in default but I want jobs to run in prefect - perhaps the agent needs to be in the same namespace?
j
uncertain if it's related, but I ran into this after a recent upgrade (version pinning as specified fixed it for me on EKS): https://github.com/dask/dask-kubernetes/issues/419
🙏 1
relevant prefect issue (looks like the upstream was fixed): https://github.com/PrefectHQ/prefect/issues/5573
j
Yeah I think its probably a very silly setup issue most likely...
Ok so what is the most recent version - I am using
prefecthq/prefect:1.1.0-python3.7
in my config (the default) perhaps this is not the most recent?
1.1.0 does seem to be most recent on dockerhub 🤷
unless the fix is in the 2.0 beta?
a
if you ask about the
latest
tag, it's always the lowest supported version of Python (currently Python 3.7) and the latest Prefect release which as of now is 1.1.0
m
@Joshua Greenhalgh: the github issue was fixed in a dependent package. So that shouldn’t be the issue. From what I can see, it looks like a permission issue
upvote 1
j
@Matthias and @Anna Geller - thanks both will try to dig a bit deeper into RBAC stuff and see if I can work out the problem - I very much just followed the instructions here -> https://docs.prefect.io/orchestration/agents/kubernetes.html#rbac there is only mention of possible additional permissions required on AWS in regards to S3 so maybe some update to this documentation could be useful? I know k8s is a beast though maybe worth a section for main public clouds?
m
Does the namespace
prefect
exist in your cluster?
j
yep created it before the agent;
Copy code
(prefect-k8s) ➜  prefect-k8s kubectl get ns                    
NAME              STATUS   AGE
default           Active   3h14m
kube-node-lease   Active   3h14m
kube-public       Active   3h14m
kube-system       Active   3h14m
prefect           Active   165m
gonna try and just run everything in default see if that solves then will try to work out putting stuff in a particular ns
👍 1
Yeah sticking everything in default worked - my expectation was that the
--namespace
flag would provide config that created agent and rbac to work in that namespace but it seems to merely control the namespace that the jobs would run...
💪 1
a
nice work!
m
So then it is indeed a permission issue…