Hi folks I have a kubernetes infrastucture setup with namesp Prefect Community #ask-community

Hi folks, I have a kubernetes infrastucture setup ...

Deceivious

04/20/2023, 1:42 PM

Hi folks, I have a kubernetes infrastucture setup with namespace set to

pefect2-worker-dev

[image below]. When I try to execute a flow run in that infrastructure, it gives out the following error.

Copy code

Submission failed. kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e6<<SOME STUFF HERE>>37:14 GMT', 'Content-Length': '330'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:prefect2-worker-dev:prefect-worker\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}

I am unsure as to what actually determines which namespace the job is created in. [prefect v2.10.4 on python 3.10]

👀 1

Deceivious

04/20/2023, 1:42 PM

also in #C048SVCEFF0 https://prefect-community.slack.com/archives/C048SVCEFF0/p1681980426131549

Deceivious

04/20/2023, 1:44 PM

redsquare

04/20/2023, 1:50 PM

me again 🙂 have you seen https://prefect-community.slack.com/archives/C048ZHT5U3U/p1681785564962719?thread_ts=1681754473.501149&cid=C048ZHT5U3U

🙌 1

Deceivious

04/20/2023, 2:12 PM

How do i get the cluster UID? 😅

redsquare

04/20/2023, 2:16 PM

think you have to set it in helm

Deceivious

04/20/2023, 2:44 PM

Unsure but the link you have posted seems like different issues. In my usecase, the worker is trying to create a job in the default NS and not the NS I have explicitly provided, which results in an exception because the SA has no access to default NS. My question is why is the SA trying to run the job in

default

NS and not the one I explicitly provided and how do I change it?

redsquare

04/20/2023, 2:47 PM

Yeah I am looking back when I had similar error messages (4/5 months) - at the time my fix was to create a ClusterRoleBinding for the service account

redsquare

04/20/2023, 2:48 PM

before they introduced what the link said

Deceivious

04/20/2023, 2:49 PM

M checking prefect-helm for worker and it does have cluster role binding set by default . https://github.com/PrefectHQ/prefect-helm/blob/main/charts/prefect-worker/templates/rolebinding.yaml

redsquare

04/20/2023, 2:51 PM

I am sure @jawnsy will assist when he is around

👀 1

Deceivious

04/20/2023, 3:46 PM

OK Ive run some tests. Deployed 2 set of workers in 2 NS. NS#1: helm_workers [Uses helm deployed workers] NS#2: test_workers [Uses prefect kubernetes manifest command worker] When I only have helm workers it always tries to run jobs in

default

NS no matter what I put in the KubernetesJob during deployment. The deployment being same, when I turn down the helm workers and switch to manifest worker. The SA automatically picks the NS specified in KubernetesJob during deployment. So either theres something wrong on the helm chart OR I am not deploying helm chart correctly. I use

helm install --namespace=prefect2-worker-dev  --values=./worker.dev.yaml prefect-worker prefect/prefect-worker

to deploy.

jawnsy

04/20/2023, 5:05 PM

That’s very strange, the jobs that get created are controlled by the Kubernetes Worker (via KubernetesJob as you say), so as long as they’re set in the worker pool when creating it (and you’re not overriding it in the job you create), see screenshot, they should go to the correct namespace. What version of prefect & prefect-kubernetes do you have?

Deceivious

04/20/2023, 5:34 PM

Prefect is 2.10.4

Deceivious

04/20/2023, 5:34 PM

I don't think i use prefct Kubernetes

Deceivious

04/20/2023, 5:37 PM

Yes i just checked my pyproject config doesn't have prefect Kubernetes specified

Deceivious

04/21/2023, 8:06 AM

Copy code

from prefect.infrastructure.kubernetes import KubernetesJob

I am using this as infrastructure.

alex

04/21/2023, 3:07 PM

Hey @Deceivious, if you’re using the

KubernetesJob

infrastructure block, then you’ll need to use an agent instead of the Kubernetes worker. You can use the agent prefect helm chart to deploy an agent instead of the worker helm chart.

Deceivious

04/21/2023, 3:21 PM

This is a bit confusing to me. New to kuber and helm. But isnt Prefect worker an native concept of Prefect and has nothing to do with Kubernetes ? If I can deploy prefect agent with helm, why is it that deploying worker with helm causes issue?

Deceivious

04/21/2023, 3:30 PM

@alex

alex

04/21/2023, 3:35 PM

Workers and agents are Prefect specific concepts. Agents work with infrastructure blocks like the

KubernetesJob

block and poll for flow runs from

prefect-agent

typed work pools. Workers are a newer concept (they’re still in beta), but they are like an agent and an infrastructure block combined and poll for flow runs from typed work pools. All this means that you either need to use a agent + infra block or a worker.

Deceivious

04/21/2023, 3:46 PM

Thanks @alex 🫡

sjammula

04/21/2023, 7:24 PM

@alex I'm facing similar issue in my machine where I'm not using any helm chart.can you suggest me a way forward to overcome this error. Submission failed. kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '85efeb11-33b6-4a58-91ad-d0a60bffce1a', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '513745c7-aa7c-4245-8349-ec7b488ba2ba', 'X-Kubernetes-Pf-Prioritylevel-Uid': '17e7873d-25c0-45c1-955e-4c4692a6bb21', 'Date': 'Fri, 21 Apr 2023 182518 GMT', 'Content-Length': '311'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"systemserviceaccountprefect:prefect\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403

Deceivious

04/21/2023, 8:41 PM

@sjammula If you arent using the helm file but are using the output of

prefect kubernetes manifest agent

, you need to specify which name space to run it in [theres an cli parameter for it]. And when you deploy the flow, you need to ensure that the KubernetesJob has the same namespace.

Deceivious

04/21/2023, 8:53 PM

Hi @alex, sorry to ping again. I am still a bit confused. I kinda get the difference between agent and worker. BUT, why is it that deploying the worker with

prefect kubernetes manifest

command works with the correct queue and the

helm

deployment fails. If both the workers are equivalent despite the deployment method, either both should work or both should fail. Unsure about the change in behavior based on deployment method.

alex

04/21/2023, 8:59 PM

The manifest that get’s generated from

prefect kubernetes manifest

is a manifest for agent deployment. We haven’t added a worker manifest to that CLI command yet. You might be deploying an agent with one method and a worker with the other method, depending on which helm chart you’re using.

🙌 1

Deceivious

04/24/2023, 7:37 AM

THanks

redsquare

04/24/2023, 7:44 AM

Just so that I am clear what are the advantages of a worker in a k8s env?

👀 1

Joshua Greenhalgh

06/06/2023, 11:29 AM

@alex would really appreciate some more detail around the differences between an agent and a worker?

Joshua Greenhalgh

06/06/2023, 11:30 AM

Are there some docs?

jawnsy

06/06/2023, 1:51 PM

@Joshua Greenhalgh Agents were our first generation system for collecting/running work. Workers are an updated version. If you’re just starting out today, use a worker. There are docs here: https://docs.prefect.io/2.10.12/concepts/work-pools/

upvote 1

alex

06/06/2023, 1:54 PM

To add on to what @jawnsy said, workers are scoped to a specific type of infrastructure and offer more customization compared to agents. Workers also have more observability features compared to agents when used with Prefect Cloud.

👍 1

Joshua Greenhalgh

06/06/2023, 2:08 PM

Ok so I tried a worker and I got exactly the same issue as the OP - tried to create the Job in default ns but service account only has permissions for "prefect:?

jawnsy

06/06/2023, 2:10 PM

When you create the worker, it will default to the ‘default’ namespace, you need to change that to ‘prefect’ or the namespace you deployed the worker into

jawnsy

06/06/2023, 2:20 PM

Apologies that I haven’t been following this whole thread but I’ve created an issue for the default namespace problem here: https://github.com/PrefectHQ/prefect/issues/9845 This is something we can improve :)

redsquare

06/06/2023, 2:27 PM

@alex will agents be depreciated/removed any time soon or are we safe for a while

alex

06/06/2023, 2:32 PM

We have a 6 month deprecation period before removing functionality, so there will be plenty of notice before agents are removed.

👍 1

Joshua Greenhalgh

06/06/2023, 2:40 PM

its not that the agent is in the wrong namespace its that the job thats created is in the wrong namespace?

Joshua Greenhalgh

06/06/2023, 2:41 PM

I used raw manifests the following;

Copy code

---
# Source: prefect-worker/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prefect-worker
  namespace: "prefect"
---
# Source: prefect-worker/templates/role.yaml
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  name: prefect-worker
  namespace: "prefect"
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "pods/status"]
  verbs: ["get", "watch", "list"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
---
# Source: prefect-worker/templates/rolebinding.yaml
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
  name: prefect-worker
  namespace: "prefect"
roleRef:
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
  kind: Role
  name: prefect-worker
subjects:
  - kind: ServiceAccount
    name: prefect-worker
    namespace: "prefect"
---
# Source: prefect-worker/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prefect-worker
  namespace: "prefect"
  labels:
    app: prefect-worker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prefect-worker
  template:
    metadata:
      labels:
        app: prefect-worker
    spec:
      serviceAccountName: prefect-worker
      securityContext:
        fsGroup: 1001
        runAsNonRoot: true
        runAsUser: 1001
      containers:
        - name: prefect-worker
          image: "prefecthq/prefect:2.10.12-python3.9-kubernetes"
          imagePullPolicy: IfNotPresent
          command:
            - prefect
            - worker
            - start
            - --type
            - kubernetes
            - --pool
            - default-agent-pool
            - --work-queue
            - default
          workingDir: /home/prefect
          env:
            - name: HOME
              value: /home/prefect
            - name: PREFECT_AGENT_PREFETCH_SECONDS
              value: "10"
            - name: PREFECT_AGENT_QUERY_INTERVAL
              value: "5"
            - name: PREFECT_API_ENABLE_HTTP2
              value: "true"
            - name: PREFECT_API_URL
              value: "<http://host.docker.internal:4200/api>"
            - name: PREFECT_KUBERNETES_CLUSTER_UID
              value: ""
            - name: PREFECT_DEBUG_MODE
              value: "false"
          resources:
            limits:
              cpu: 1000m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1001
          volumeMounts:
            - mountPath: /home/prefect
              name: scratch
              subPathExpr: home
            - mountPath: /tmp
              name: scratch
              subPathExpr: tmp
      volumes:
        - name: scratch
          emptyDir: {}

Joshua Greenhalgh

06/06/2023, 2:41 PM

working on a local minikube at the momentr

Joshua Greenhalgh

06/06/2023, 3:42 PM

I assume this is the line thats causing the job to run in default? https://github.com/PrefectHQ/prefect-kubernetes/blob/8c33171a7dbe1e2cd304162fcd1331d48cb5248d/prefect_kubernetes/worker.py#L229 - I just have no idea how I am supposed to override this? It should be an arg I can pass to the worker no?

Joshua Greenhalgh

06/06/2023, 4:07 PM

This is where the job gets created - https://github.com/PrefectHQ/prefect-kubernetes/blob/8c33171a7dbe1e2cd304162fcd1331d48cb5248d/prefect_kubernetes/worker.py#LL622C23-L622C57 - it passes in

configuration.namespace

- I need that to be different from 'default' 😞

Joshua Greenhalgh

06/06/2023, 4:08 PM

tried setting here;

Copy code

infra = KubernetesJob(
        image=f"{BASE_IMAGE}:production",
        image_pull_policy=IMAGE_PULL_POLICY,
        namespace="prefect",
    )

still get;

Copy code

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:prefect:prefect-worker\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}

redsquare

06/06/2023, 4:11 PM

@Joshua Greenhalgh you will need ClusterRole+ClusterRoleBinding

redsquare

06/06/2023, 4:12 PM

see https://prefect-community.slack.com/archives/CL09KU1K7/p1682533114389149?thread_ts=1682361477.548889&cid=CL09KU1K7

Joshua Greenhalgh

06/06/2023, 4:14 PM

this is all in place I have all the roles correct but the roles give permission in the prefect namespace which is where I want the jobs to run - but the jobs are started in default which the worker does not have permission to do?

redsquare

06/06/2023, 4:20 PM

ah ok, you might need an infra_overrides["namespace"] on the deployment

Joshua Greenhalgh

06/06/2023, 4:55 PM

Yeah that did something but still not quite there... 😞 - thanks - really finding this move to V2 very very difficult

redsquare

06/06/2023, 6:13 PM

show me your deployment if you can

Joshua Greenhalgh

06/07/2023, 9:27 AM

@redsquare - shall setup a mini repo and share

👍 1

Joshua Greenhalgh

06/07/2023, 11:00 AM

@redsquare see -> https://github.com/josh-gree/repro

redsquare

06/07/2023, 11:10 AM

looks the same as mine - if you add output = 'deployment_build_output.yaml' to the deployment do you see the correct namespace in the generated file

Joshua Greenhalgh

06/07/2023, 11:22 AM

I think I have half solved it (more issues though 😞 ) but I feel this is completely undocumented (outside looking in the source https://github.com/PrefectHQ/prefect-kubernetes/blob/8c33171a7dbe1e2cd304162fcd1331d48cb5248d/prefect_kubernetes/worker.py#L684)? So the manifests come from the helm chart basically - if you don't specify "PREFECT_KUBERNETES_CLUSTER_UID" - it attempts to construct a UID from something that lives in the kube-system - but the manifests do not construct a role that can read that namespace - so I have just generated a uuid and set that value in the manifest - I am hoping it just needs to be any unique ID? Now the pod runs! However it can't find my flow inside the container but think I can probably work this out...

redsquare

06/07/2023, 11:48 AM

cool - yeah your path probably wrong

Joshua Greenhalgh

06/07/2023, 12:29 PM

Thanks for the help!

redsquare

06/07/2023, 12:30 PM

good luck with it all

jawnsy

06/07/2023, 3:02 PM

The background is that we need a way to uniquely identify clusters in order to support cancellation, but there’s no great general way to do that in Kubernetes. The idea to use the UID of the kube-system namespace came from Tim Hockin (one of the early Kubernetes maintainers, who did a lot of the networking stuff). We do the lookup during helm install time to avoid needing to grant the service account cluster-wide read permissions on namespaces, which would be necessary for the code-based lookup to work. Something we didn’t anticipate when implementing this feature is that some systems like ArgoCD or when running

helm template

won’t actually run the lookup, but also won’t emit an error, it’ll just return an empty value instead. So the worker has no choice but to try to look up the UID at runtime, which fails due to missing permissions, which we also don’t want to add (by default we don’t add a ClusterRole or ClutserRoleBinding for the worker service account) This is definitely a decision we should revisit, though! Sorry about the rough edges here

Joshua Greenhalgh

06/07/2023, 3:03 PM

NP - it just needs to be some random id then? So self generated uuid4 is fine?

Joshua Greenhalgh

06/07/2023, 3:04 PM

And yeah I used template to get the manifests - don't want to have to hook up helm to me terrafrom setup really

jawnsy

06/07/2023, 3:08 PM

Yeah, I think it just needs to be unique for each cluster you’re running an agent in

Joshua Greenhalgh

06/07/2023, 3:15 PM

thanks!

jawnsy

06/07/2023, 3:34 PM

I wrote up an issue here, however, I think it’s low priority because the lookup alright for most users. Feel free to add a comment if you disagree, it helps us prioritize https://github.com/PrefectHQ/prefect/issues/9851

Joshua Greenhalgh

06/07/2023, 3:51 PM

All I would add is that k8s api error to the issue - then it's probably findable;

Copy code

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:prefect:prefect-worker\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"kube-system","kind":"namespaces"},"code":403}

👍 1

5 Views

Open in Slack

Previous Next