Hi all, I am still dipping my toes in the Prefect-...
# ask-community
t
Hi all, I am still dipping my toes in the Prefect-ocean and I have some questions regarding flows that consist of tasks that all run some custom Docker image on a Kubernetes (AKS) cluster. Currently I have set up an in-cluster Kubernetes agent with the 
prefect agent kubernetes install -rbac ...
 command, so the RBAC is functioning on the agent. When starting a very simple flow that consists of a single 
RunNamespacedJob
 task running the custom Docker image the job pod starting the flow runs into RBAC issues, but the 
RunNamespacedJob
 task pod runs fine. My question is, how to handle job pods that are going to spawn several jobs on Kubernetes and the issues that arise with the RBAC on these pods. Am I thinking about this incorrectly? The error, for reference:
HTTP response headers: HTTPHeaderDict({'Audit-Id': '67b79e3e-ab13-45ee-8ad5-2ae1769c6a7f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 10 Jun 2021 09:21:02 GMT', 'Content-Length': '372'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"cmems-historical\" is forbidden: User \"system:serviceaccount:prefect-zone:default\" cannot get resource \"jobs/status\" in API group \"batch\" in the namespace \"prefect-zone\"","reason":"Forbidden","details":{"name":"cmems-historical","group":"batch","kind":"jobs"},"code":403}
c
Hi @Thomas Nyegaard-Signori, not sure if this will help but here's my
prefect_agent_conf.yaml
, I think the prefect command actually misses some parts when it generates the conf. https://github.com/pangeo-forge/pangeo-forge-azure-bakery/blob/main/prefect_agent_conf.yaml
❤️ 1
t
Hi @ciaran, nice with a reference agent conf, thanks! I tried adding the
pods/log
and
services
resources but the job pods still seem to error out with RBAC issues while
RunNamespacedJob
pod starts up and runs, no problem.
c
Hmmm, I'll be honest, that's as far as my k8s experience goes 🤣 Getting that bakery up and running was a tough nut to crack haha! You're definitely including the
namespace
entry in your configurations? And the values are the same?
t
I was including the
namespace
, yeah. For now, I have fixed it by binding the role created for the
agent
to the
default
service account for the namespace, which all pods spawned by the agent seem to be using. Whether this is a big no-no in kubernetes, I dont know, im still trying to learn this weird and wonderful kubernetes stuff 🤖 Thanks @ciaran 🙏
c
Haha no worries! Kubernetes is the wild west to me 🤣
🤠 2
t
Hi Thomas! And thanks for jumping in, Ciaran! I believe I can explain what's happening here. Thomas could you just explain to me what you mean by
the RunNamespacedJob task pod runs fine.
Just to follow up ahead of that answer here, your best solution is to create an additional serviceaccount, role, and rolebinding within the namespace for your flows (Prefect can't do this for you). Then you can specify this service account in the KubernetesRun run config and it should have the proper permissions. If you need help with that let me know, should be straight forward
t
Hi Tyler 👋 So, when starting the flow there are a total of 2 pods that get started up: the
prefect-job...
pod is what I referred to as the job pod which was the one that was failing with the RBAC and the
cmems-historical...
is the pod started by the
RunNamespacedJob
task in the flow. The task pod starting running just fine but the RBAC issues came up when the job pod was checking logs/status (?) of the task pod, failing the flow but the task pod still chugging along nicely.
And thanks for the tip, much appreciated!
t
ah OK. You can start by assigning the agent's serviceaccount to the flow's run config (as I believe you've done already?). That will have most the permissions you need to get going. Otherwise, my above advice around designing the additional RBAC config. If you want to be super granular and best practicey, start very restrictive and keep running and failing, adding additional permissions each time it fails on RBAC. The agent's serviceaccount should have the batch jobs/status permission in prefect-zone namespace already, so I think you're good to go now? If not, I'm here to help
t
For now I had just added the roles needed to the default serviceaccount for the namespace, but that seemed a little too open. I'll play around with the assigned roles and start off restrictive. Great help, thanks @Tyler Wanner 👍
🙏 1