Thomas Nyegaard-Signori

    Thomas Nyegaard-Signori

    1 year ago
    Hi all, I am still dipping my toes in the Prefect-ocean and I have some questions regarding flows that consist of tasks that all run some custom Docker image on a Kubernetes (AKS) cluster. Currently I have set up an in-cluster Kubernetes agent with the 
    prefect agent kubernetes install -rbac ...
     command, so the RBAC is functioning on the agent. When starting a very simple flow that consists of a single 
    RunNamespacedJob
     task running the custom Docker image the job pod starting the flow runs into RBAC issues, but the 
    RunNamespacedJob
     task pod runs fine. My question is, how to handle job pods that are going to spawn several jobs on Kubernetes and the issues that arise with the RBAC on these pods. Am I thinking about this incorrectly? The error, for reference:
    HTTP response headers: HTTPHeaderDict({'Audit-Id': '67b79e3e-ab13-45ee-8ad5-2ae1769c6a7f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 10 Jun 2021 09:21:02 GMT', 'Content-Length': '372'})
    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"cmems-historical\" is forbidden: User \"system:serviceaccount:prefect-zone:default\" cannot get resource \"jobs/status\" in API group \"batch\" in the namespace \"prefect-zone\"","reason":"Forbidden","details":{"name":"cmems-historical","group":"batch","kind":"jobs"},"code":403}
    ciaran

    ciaran

    1 year ago
    Hi @Thomas Nyegaard-Signori, not sure if this will help but here's my
    prefect_agent_conf.yaml
    , I think the prefect command actually misses some parts when it generates the conf. https://github.com/pangeo-forge/pangeo-forge-azure-bakery/blob/main/prefect_agent_conf.yaml
    Thomas Nyegaard-Signori

    Thomas Nyegaard-Signori

    1 year ago
    Hi @ciaran, nice with a reference agent conf, thanks! I tried adding the
    pods/log
    and
    services
    resources but the job pods still seem to error out with RBAC issues while
    RunNamespacedJob
    pod starts up and runs, no problem.
    ciaran

    ciaran

    1 year ago
    Hmmm, I'll be honest, that's as far as my k8s experience goes 🤣 Getting that bakery up and running was a tough nut to crack haha! You're definitely including the
    namespace
    entry in your configurations? And the values are the same?
    Thomas Nyegaard-Signori

    Thomas Nyegaard-Signori

    1 year ago
    I was including the
    namespace
    , yeah. For now, I have fixed it by binding the role created for the
    agent
    to the
    default
    service account for the namespace, which all pods spawned by the agent seem to be using. Whether this is a big no-no in kubernetes, I dont know, im still trying to learn this weird and wonderful kubernetes stuff 🤖 Thanks @ciaran 🙏
    ciaran

    ciaran

    1 year ago
    Haha no worries! Kubernetes is the wild west to me 🤣
    Tyler Wanner

    Tyler Wanner

    1 year ago
    Hi Thomas! And thanks for jumping in, Ciaran! I believe I can explain what's happening here. Thomas could you just explain to me what you mean by
    the RunNamespacedJob task pod runs fine.
    Just to follow up ahead of that answer here, your best solution is to create an additional serviceaccount, role, and rolebinding within the namespace for your flows (Prefect can't do this for you). Then you can specify this service account in the KubernetesRun run config and it should have the proper permissions. If you need help with that let me know, should be straight forward
    Thomas Nyegaard-Signori

    Thomas Nyegaard-Signori

    1 year ago
    Hi Tyler 👋 So, when starting the flow there are a total of 2 pods that get started up: the
    prefect-job...
    pod is what I referred to as the job pod which was the one that was failing with the RBAC and the
    cmems-historical...
    is the pod started by the
    RunNamespacedJob
    task in the flow. The task pod starting running just fine but the RBAC issues came up when the job pod was checking logs/status (?) of the task pod, failing the flow but the task pod still chugging along nicely.
    And thanks for the tip, much appreciated!
    Tyler Wanner

    Tyler Wanner

    1 year ago
    ah OK. You can start by assigning the agent's serviceaccount to the flow's run config (as I believe you've done already?). That will have most the permissions you need to get going. Otherwise, my above advice around designing the additional RBAC config. If you want to be super granular and best practicey, start very restrictive and keep running and failing, adding additional permissions each time it fails on RBAC. The agent's serviceaccount should have the batch jobs/status permission in prefect-zone namespace already, so I think you're good to go now? If not, I'm here to help
    Thomas Nyegaard-Signori

    Thomas Nyegaard-Signori

    1 year ago
    For now I had just added the roles needed to the default serviceaccount for the namespace, but that seemed a little too open. I'll play around with the assigned roles and start off restrictive. Great help, thanks @Tyler Wanner 👍