chara
07/27/2023, 2:43 PMimcom
07/30/2023, 9:20 AMon_cancellation
state change hook? From what I've seen so far, if I try to catch except (asyncio.exceptions.CancelledError, prefect.exceptions.TerminationSignal)
and then return return Cancelled(_message_=f"Cancelled: {_task_id_} is cancelled")
in the flow
logic, I ended up with no cancellation
hook fired at all. I've also tried returning Cancelling
state, still no go. Could anyone shed some lights on the cancellation hook?imcom
07/30/2023, 9:38 AMspot instance
shutdown vs. user initiated cancel
in k8s? Assume I have the ASG with spot instances, when the nodes are being released, I would get SIGTERM
in the running pods tooJohn Horn
07/31/2023, 9:03 PMKiley Roberson
08/02/2023, 7:23 PMHTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:prefect:prefect-worker\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"prefect\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}
The kubernetes work pool has the namespace set to prefect
and the service account name set to prefect-worker
. At first I was able to run jobs and it was working but then I had to make edits to the role to allow it to read secrets and then this started happening. Yaml files I used for the Roles are in the thread! Would really appreciate any insight into this - thanks!Christine Chen
08/02/2023, 11:45 PMNoam Banay
08/07/2023, 10:06 AMSlackbot
08/08/2023, 11:45 AMOfir
08/08/2023, 7:08 PMprefect-server
and a prefect-agent
.
What if 90% of the day the prefect-agent
(which is running on a GPU node on the cluster) is idle?
This means it’s underutilized and we waste money for no good reason.
Reference: Airflow provides Kubernetes Executor - on-demand/ad-hoc worker pods.
Since Prefect thought of everything - I’m sure there is either a built-in capability for that or a design pattern for achieving that.
Thanks!Luke Segars
08/10/2023, 6:24 PMOshri
08/11/2023, 6:00 PMMikaël Ferreira de Almeida
08/14/2023, 1:11 PMDominick Olivito
08/14/2023, 3:17 PM2023-08-12T20:00:48.907217+00:00 SCHEDULED Scheduled
2023-08-12T20:00:51.114967+00:00 PENDING Pending
2023-08-12T20:01:52.804776+00:00 CRASHED Crashed
2023-08-12T20:02:05.900502+00:00 RUNNING Running
2023-08-12T20:02:31.613829+00:00 COMPLETED Completed
Kiley Roberson
08/15/2023, 4:21 PMGregory Hunt
08/23/2023, 4:42 PMGregory Hunt
08/24/2023, 5:01 PMubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id': '146e8148-7a8d-485c-827e-659f48837db7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 24 Aug 2023 16:58:24 GMT', 'Content-Length': '218'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get \\"<https://10.0.90.47:10250/containerLogs/prefect/gabby-starling-r6fbg-tszq6/prefect-job?follow=true>\\": No agent available","code":500}\n'
Tom Klein
08/27/2023, 7:47 PMOOMKilled
pods of flow runs on k8s.
So - putting aside for a moment the reason why it’s OOM to begin with - what happens is that the flow almost completely succeeds (or completely succeeds) - but - somehow still finds itself being run over and over and over to no end.
There are no retries defined on the flow itself, and the kube manifest yaml is set to restartPolicy = never
Yet still, we get this:Mikaël Ferreira de Almeida
08/28/2023, 3:24 PMk8s_jb = KubernetesJob(
...
namespace="prefect",
)
After saving the block I can see in the UI that the namespace is correctly defined.
But sometimes, when I run a deployment flow, I get these errors:
TTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:prefect:prefect-worker\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}
The error message indicates that I'm in the wrong namespace. If I use the "Retry" button, at some point it will be executed in the "prefect" namespace.
Sometimes I don't get the error, sometimes I have to retry 5 times.
Does anyone have the same problem?Gregory Hunt
09/01/2023, 1:33 PMMichelle Brochmann
09/11/2023, 6:58 PMEduardo Mota
09/11/2023, 10:33 PMFrost Ouyang
09/12/2023, 5:53 PMWorker 'KubernetesWorker 7763f002-66a8-475f-90cf-b40e048d3e02' submitting flow run '83edf78a-26ec-4f1a-870b-c82b96f25db2'
Creating Kubernetes job...
Job 'axiomatic-oyster-lcgft': Pod has status 'Pending'.
Completed submission of flow run '83edf78a-26ec-4f1a-870b-c82b96f25db2'
Job 'axiomatic-oyster-lcgft': Pod has status 'Running'.
Crash detected! Execution was interrupted by an unexpected exception: PrefectHTTPStatusError: Client error '403 Forbidden' for url '<https://api.prefect.cloud/api/accounts/xxxx/workspaces/xxxx/block_schemas/>'
Response: {'detail': 'Workspace scopes missing: manage_blocks'}
For more information check: <https://httpstatuses.com/403>
Job 'axiomatic-oyster-lcgft': Job reached backoff limit.
I masked our account id and workspace id. Anyone has any ideas? Thanks.Nick Hoffmann
09/13/2023, 5:41 PMRussell Brooks
09/13/2023, 8:02 PMGeese Howard
09/14/2023, 8:28 AMnamespaceOverride: prefect
worker:
cloudApiConfig:
accountId: HIDE
workspaceId: HIDE
config:
workPool: gke
workPool: gke-cpu-5000m-4Gi
serviceAccount:
create: false
name: "HIDE"
Geese Howard
09/14/2023, 8:54 AMFlorent VanDeMoortele
09/15/2023, 5:03 PMError: parse error at (prefect-worker/charts/common/templates/_labels.tpl:14): unclosed action
Is there a breaking change?Nathan
09/17/2023, 7:53 PMFileNotFoundError
every single time a flow is triggered (tracebacks in the comments). I know for a fact these files/directories exist because I shell into the pods and verify it. Fair to say it's driving me nuts.
Here's how I'm set up:
1. Official Prefect Helm chart >> https://github.com/PrefectHQ/prefect-helm/tree/main/charts/prefect-worker
a. Nothing modified except the values it asks for.
2. Official Docker image >> prefecthq/prefect:2.12.0-python3.11-kubernetes
3. I have a flows
directory with sub-modules for each flow
4. Alongside the flows
directory, I have two standalone python files that build Blocks using KubernetesJob()
and .save()
and deploy Flows using deployment.Build_from_flow()
.
a. In the Dockerfile, I just copy all of this into /opt/prefect
so resulting directory is /opt/prefect/flows/, blocks.py, deploy.py
5. I have a custom entrypoint.sh
script that logs into Prefect, runs the files to build blocks and deploy flows, and then starts the worker. It does not modify file structure at all and everything runs as expected, i.e. Blocks built, Flows deployed, and Worker started in k8s.
Any help or pointers as to what is going wrong would be greatly appreciated. I've been trying to debug this for a few days now.Theis Ferré Hjortkjær
09/18/2023, 7:09 AMfrom prefect import flow, task
from prefect_ray.task_runners import RayTaskRunner
from prefect_ray.context import remote_options
@task
def process(x):
return x + 1
@flow(task_runner=RayTaskRunner("ray://<my-ray-service>:10001"))
def my_flow():
# equivalent to setting @ray.remote(num_cpus=4)
with remote_options(num_cpus=4):
process.submit(42)
It fails with a runtime error:
RuntimeError: There is no current event loop in thread 'ray_client_server_1'.
If i do not specify resources for my task, everything works as intended.
I have submitted an issue, but it has not been picked up yet, so i thought maybe i could get some help here 🙂
https://github.com/PrefectHQ/prefect/issues/10542Joe Nelson
09/20/2023, 8:18 PMjob
to appear on the pod
it generates? Prefect 1.x put useful labels on the pod
, e.g. <http://prefect.io/flow_id|prefect.io/flow_id>
, which are no longer present in any Prefect 2 pod
and only appear on the job
.
job
container labels don’t work for our purposes - we’re trying to use logs to set of alerts, and the label fields have to be on the pod
.
Happy to provide additional information, if it would be helpful!