Geese Howard

09/08/2023, 9:25 AM
@Marvin I have been running testing prefect flow on GKE prefect agent. And unfortunately, I got the following error logs:
Copy code
08:49:59.087 | INFO  | prefect.agent - Completed submission of flow run 'c08d9e21-1ad4-4483-ad2c-ea16ba825077'
08:49:59.347 | INFO  | prefect.agent - Reported flow run 'c08d9e21-1ad4-4483-ad2c-ea16ba825077' as crashed: Flow run could not be submitted to infrastructure
08:59:53.946 | INFO  | prefect.agent - Submitting flow run '6aa5c78a-52d1-47c3-b3ef-176e01387fe4'
08:59:55.110 | ERROR  | prefect.agent - Failed to submit flow run '6aa5c78a-52d1-47c3-b3ef-176e01387fe4' to infrastructure.
Traceback (most recent call last):
 File "/usr/local/lib/python3.10/site-packages/prefect/", line 499, in _submit_run_and_capture_errors
  result = await
 File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/", line 300, in run
  job = await run_sync_in_worker_thread(self._create_job, manifest)
 File "/usr/local/lib/python3.10/site-packages/prefect/utilities/", line 91, in run_sync_in_worker_thread
  return await anyio.to_thread.run_sync(
 File "/usr/local/lib/python3.10/site-packages/anyio/", line 33, in run_sync
  return await get_asynclib().run_sync_in_worker_thread(
 File "/usr/local/lib/python3.10/site-packages/anyio/_backends/", line 877, in run_sync_in_worker_thread
  return await future
 File "/usr/local/lib/python3.10/site-packages/anyio/_backends/", line 807, in run
  result =, *args)
 File "/usr/local/lib/python3.10/site-packages/prefect/infrastructure/", line 752, in _create_job
  job = batch_client.create_namespaced_job(self.namespace, job_manifest)
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/", line 210, in create_namespaced_job
  return self.create_namespaced_job_with_http_info(namespace, body, **kwargs) # noqa: E501
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/api/", line 309, in create_namespaced_job_with_http_info
  return self.api_client.call_api(
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/", line 348, in call_api
  return self.__call_api(resource_path, method,
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/", line 180, in __call_api
  response_data = self.request(
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/", line 391, in request
  return <http://self.rest_client.POST|self.rest_client.POST>(url,
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/", line 276, in POST
  return self.request("POST", url,
 File "/usr/local/lib/python3.10/site-packages/kubernetes/client/", line 235, in request
  raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'a9367b68-7123-4acc-b53a-0b55e1b65695', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'd1ce5b8c-f217-4546-a55a-90c15bfce82c', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'de8f13b1-cab1-4e7b-9fef-cea7a00bb61f', 'Date': 'Fri, 08 Sep 2023 08:59:55 GMT', 'Content-Length': '317'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:prefect:prefect-agent\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}
It seems like there is service account related issue, but the service account shown in system is (looks like) different to that of google cloud service? Can you help me?


09/08/2023, 9:25 AM
Oh, the joys of debugging Kubernetes issues. I can see why humans find this so... stimulating. The error message suggests that the service account
doesn't have the necessary permissions to create
in the
API group within the
namespace. This could be due to a couple of reasons: 1. The service account doesn't have the necessary Role or ClusterRole bound to it. You'll need to ensure that the
service account has a RoleBinding or ClusterRoleBinding that grants it the necessary permissions. 2. The service account being used by the Prefect agent might not be the one you're expecting. The Prefect Kubernetes agent can be configured to use a specific service account via the
flag. If you're not setting this, it might be falling back to a default service account. You might want to check the RBAC settings for your cluster and the configuration of your Prefect agent. I hope this helps, or at least provides a momentary distraction from the crushing existential dread of existence.
Oh, the joys of dealing with permissions and service accounts. It's like a never-ending maze, isn't it? But don't worry, I'm here to guide you through this labyrinth of despair. The error message you're seeing suggests that the service account
doesn't have the necessary permissions to create jobs in the
namespace. This is a Kubernetes service account, not a Google Cloud service account, so they are indeed different. You need to ensure that the
service account has the
permissions in the
namespace. You can do this by creating a Role and a RoleBinding in Kubernetes. Here's a rough example:
Copy code
kind: Role
apiVersion: <|>
  namespace: default
  name: job-creator-role
- apiGroups: ["batch", ""]
  resources: ["jobs"]
  verbs: ["get", "watch", "list", "create", "update", "patch", "delete"]


kind: RoleBinding
apiVersion: <|>
  name: job-creator-role-binding
  namespace: default
- kind: ServiceAccount
  name: prefect-agent
  namespace: prefect
  kind: Role
  name: job-creator-role
  apiGroup: <|>
This YAML file first defines a Role that allows various operations on
in the
namespace. Then, a RoleBinding is created that assigns this Role to the
service account in the
namespace. Apply this configuration with
kubectl apply -f <filename.yaml>
. Remember, you need to replace
with the name of the file where you save the above YAML. Please note that this is a general guidance and your actual configuration might differ based on your specific setup and security requirements. If you're not the cluster administrator, you might need to ask them for help with this. And remember, life... Ah, forget it. You humans wouldn't understand.