I am getting this error ```prefect.exceptions.Fail...
# prefect-kubernetes
n
I am getting this error
Copy code
prefect.exceptions.FailedRun: Submission failed. kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'af4884b6-f13d-4813-9b63-79f859e962d0', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '07cd00e8-f26e-4ed1-b441-b0eb2115b3b7', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f9d0e8bc-cf15-4ed5-8169-c4cddb53fff8', 'Date': 'Mon, 29 May 2023 15:05:49 GMT', 'Content-Length': '323'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:sandbox-bidw:prefect-agent\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"dev-bidw\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}
Anybody knows how to fix this? I do not see batch options in roles/permissions section.
1
j
your agent is in a different namespace than where you are trying to run your flows. Put them in the same namespace, or add additional permissions to your agents service account
n
I verified both are in same namespaces. So, that’s not an issue looks like. Which permission I need to add?
Another weird issue is that when I do quick run of deployment it’s failing immediately but when I retry same failed job run it works.
j
your error message tells me they are not in the same namespace. your agent is in the sandbox-bidw while your flow is in dev-bidw
n
Hmm you are right do I need to set namespace somewhere on prefect side too? I meant in UI
Because I checked the cluster agent is running in dev-bidw and deployments are using same
j
can you verify if you have a second agent running in the sandbox namespace?
n
strangely enough when I do
Copy code
kubectl get namespace
I do not even see that namespace.
j
do you have multiple agents polling the work pool that your flows are being submitted to?
n
yeah two of them
j
are you still receiving the original 403 error that specifies the sandbox-bidw namespace?
n
Its failing right away not sure what went wrong this time. I am getting no logs either.
In the state message I got this again
Copy code
Submission failed. kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ab19faf5-61fe-4bcd-8a8e-1e7abb8c7ec0', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '07cd00e8-f26e-4ed1-b441-b0eb2115b3b7', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f9d0e8bc-cf15-4ed5-8169-c4cddb53fff8', 'Date': 'Tue, 30 May 2023 14:42:54 GMT', 'Content-Length': '323'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:sandbox-bidw:prefect-agent\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"dev-bidw\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}
j
i’m wondering if you have a second cluster somewhere that could be running the agent in the namespace
sandbox-bidw
?
n
yeah I am trying to find that out. I will let you know. Thank you very much for your help.
Finally I was able to detect rougue agent but now i am getting this error
Copy code
File "/opt/prefect/flows/venv/lib/python3.11/site-packages/prefect/results.py", line 398, in get
    raise MissingResult("The result was not persisted and is no longer available.")
prefect.exceptions.MissingResult: The result was not persisted and is no longer available.
j
did you get that when retrying the flow or on a fresh run?
n
these are all simple dummy task I am not sure why retry would be needed but let me check. I am doing this
Copy code
deployment = run_deployment("transform-data/poc-prefect-multiple-job-test-main-transform")

logger().info("Deployment status: transform-data/poc-
prefect-multiple-job-test-main-transform")

logger().info(deployment.state.result())

return {"result": deployment.state.result()}
looks like thats what causing the problem since I am not using storage block here
Yeah looks like that was it. that was causing the retry. Its working after removing it.
Thank you once again 🙂
j
np!