https://prefect.io logo
#prefect-community
Title
# prefect-community
l

Leon Kozlowski

09/14/2022, 2:43 PM
Hi all, I am running into issues with my agent being able to submit flows to my infrastructure: Prefect Version: 2.3.0 Infrastructure: KubernetesJob Traceback:
Copy code
| ERROR   | prefect.agent - Failed to submit flow run 'FLOW_RUN_ID' to infrastructure.

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"The POST operation against Job.batch could not be completed at this time, please try again.","reason":"ServerTimeout","details":{"name":"POST","group":"batch","kind":"Job"},"code":500}
More configurations in thread
1
The k8s client is throwing:
Copy code
kubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
Role Definition:
Copy code
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
  annotations:
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: prefect-orion-agent
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: default
  creationTimestamp: "2022-09-12T19:51:03Z"
  labels:
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
  name: prefect-orion-agent-rbac
  namespace: default
  resourceVersion: "82173189"
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - pods/log
  - pods/status
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
RoleBinding:
Copy code
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
  annotations:
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: prefect-orion-agent
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: default
  creationTimestamp: "2022-09-12T19:51:03Z"
  labels:
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
  name: prefect-orion-agent-rbac
  namespace: default
  resourceVersion: "82173190"
roleRef:
  apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
  kind: Role
  name: prefect-orion-agent-rbac
subjects:
- kind: ServiceAccount
  name: prefect-orion-agent
Block definition:
Copy code
k8s_job = KubernetesJob(
    image=PRIVATE_ECR_IMAGE,
    job=KubernetesJob.job_from_file(config.KUBE_JOB_TEMPLATE_PATH),
)

# Save block with name of flow
_ = k8s_job.save(
    name=BLOCK_NAME, overwrite=True
)
This can be marked as resolved
👍 1
n

Nick Coy

09/19/2022, 8:21 PM
@Leon Kozlowski how did you resolve this? I am running into the same issue
l

Leon Kozlowski

09/19/2022, 8:23 PM
Hey @Nick Coy yes - my issue was that I had a pod name hardcoded in the label of my flow job CRD - might not be the same issue, the error messages from k8s was cryptic since it was just throwing intenral server errors
Every time the agent tried to kick off a job using
KubernetesJob
the pod name was the same
n

Nick Coy

09/19/2022, 8:31 PM
@Leon Kozlowski thank you so much, that solved the issue for me
l

Leon Kozlowski

09/19/2022, 8:33 PM
Awesome!
3 Views