hi all, we're using k8s on GKE to run our flows. they're triggered from deployments in Prefect Clou...

Dominick Olivito

08/14/2023, 3:17 PM

hi all, we're using k8s on GKE to run our flows. they're triggered from deployments in Prefect Cloud. we're running prefect 2.10.20 with an Agent on k8s. we occasionally see flow runs transition into a CRASHED state before RUNNING and then COMPLETED. the Run Count is 1 in this case. does anyone have suggestions of what we can check? here's an example set of transitions:

Copy code

2023-08-12T20:00:48.907217+00:00 SCHEDULED Scheduled
2023-08-12T20:00:51.114967+00:00 PENDING Pending
2023-08-12T20:01:52.804776+00:00 CRASHED Crashed
2023-08-12T20:02:05.900502+00:00 RUNNING Running
2023-08-12T20:02:31.613829+00:00 COMPLETED Completed

Dominick Olivito

08/14/2023, 3:38 PM

i see that it's going into CRASHED state after 60 seconds, which is the default value for

pod_watch_timeout_seconds

. i'm going to try increasing that

Sunny Pachunuri

08/17/2023, 9:50 PM

Hey @Dominick Olivito: Have you figured out what is causing this? I am running my Agent in EKS and it is running all goodl But when i am running a flow it always goes into crashed status and then after couple of minutes then it will go into completed. No idea why this is happening

Sunny Pachunuri

08/17/2023, 10:04 PM

In my case crash is happenign instantaneously

Dominick Olivito

08/18/2023, 12:53 AM

i haven't seen it again since i increased the value of

pod_watch_timeout_seconds

to 600. it looked like our pods were sometimes taking a few minutes to start up, especially if we started several flows at the same time. if it's going into CRASHED state immediately, i would just check that

pod_watch_timeout_seconds

is not set to 0. beyond that, i'm not sure of the other possible causes

Sunny Pachunuri

08/18/2023, 1:50 PM

Thanks a lot Dominick

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.