We started getting a lot of "late runs", our k8s s...
# ask-community
f
We started getting a lot of "late runs", our k8s setup is running other jobs with no issue, we can manually start the same flows and they run. Only scheduled jobs become "late". No labels that block, universal runner too. Can I get some help here? @Kevin Kho?
a
@Filip Lindvall 1. Do you use Prefect Cloud or Server? 2. Can it have something to do with DST and schedule in a specific time zone?
f
Prefect Cloud
What is DST?
a
Daylight Saving Time
f
They are marked as late in the cloud.prefect.io console
They run every 15 minute and the last 2-3 hours are marked with the "late" flag. I.e. they were not executed when expected.
a
DST is just one possible cause, there may be other things, but since in us-east it was yesterday this was the first thing on my mind to check: DST ended at 02:00 on Sunday, 7 November
so you use IntervalSchedule?
f
Yes
DST should not be an issue here
We also have cron schedules, which also become "late". Seems like they are not being dispatched properly to our workers. But creating the runs manually works.
a
Seems like they are not being dispatched properly to our workers.
Can you elaborate? How do you distribute work across your K8s cluster? do you have just one agent or multiple ones?
Can you check flow concurrency limits on those flows? Perhaps they are stuck because they are queued and concurrency limit blocks new runs?
f
One agent, creating k8s jobs per run
We have run more than 10+ in parallel, so there it should be no issue, plenty of space to spare
👍 1
a
another things worth checking would be logs on the agent pod - they may give some hints as well: kubectl logs pod_name
f
For those specific job it has no current runs.
👍 1
Copy code
2021-11-08 11:38:01.041 CET
INFO:agent:Deploying flow run f48fbae8-ad13-453d-9438-d1e2295a20a8 to execution environment...
2021-11-08 11:38:01.305 CET
INFO:agent:Completed deployment of flow run f48fbae8-ad13-453d-9438-d1e2295a20a8
2021-11-08 11:48:01.037 CET
INFO:agent:Deploying flow run bcc6921d-5843-4160-b92c-ccc2cc0c4178 to execution environment...
2021-11-08 11:48:01.330 CET
INFO:agent:Completed deployment of flow run bcc6921d-5843-4160-b92c-ccc2cc0c4178
2021-11-08 11:58:01.035 CET
INFO:agent:Deploying flow run 194f7f51-377e-490a-adad-ecbdcbe4d456 to execution environment...
2021-11-08 11:58:01.275 CET
INFO:agent:Completed deployment of flow run 194f7f51-377e-490a-adad-ecbdcbe4d456
2021-11-08 12:20:58.798 CET
INFO:agent:Deploying flow run 9a42edd9-340c-4b02-8a4c-d6058ad6f488 to execution environment...
2021-11-08 12:20:59.103 CET
INFO:agent:Completed deployment of flow run 9a42edd9-340c-4b02-8a4c-d6058ad6f488
2021-11-08 14:02:52.629 CET
INFO:agent:Deploying flow run f5ed2a41-1db9-484f-a657-51bb198b6d68 to execution environment...
2021-11-08 14:02:52.933 CET
INFO:agent:Completed deployment of flow run f5ed2a41-1db9-484f-a657-51bb198b6d68
a
thx, this looks good
f
@Zach Angell can I get some help here? 🙂
Nothing that runs on any kind of schedule starts for us, it all get's marked as "late" once it's time for it to run. But creating it manually it works with no issues
@Andreea Taylor
z
Could you share one of the problematic flow run ids? Feel free to send via DM if it's easier
1
a
No labels that block, universal runner too.
@Filip Lindvall can you share your run configuration? Have you tried using
KubernetesRun
instead of
UniversalRun
? both should work, but perhaps you need some specific configuration
f
This same setup has been working for months
Just started getting issues today, four hours ago without really changing anything
a
got it. Could you still share the run config? just to cross-check
f
Will do
Copy code
{
  "env": null,
  "type": "UniversalRun",
  "labels": [],
  "__version__": "0.15.4"
}
a
and how is your agent started? Do you run your KubernetesAgent in cluster or as a local process (prefect agent kubernetes start)?
f
Sent in DM, not sure if token ID or agent ID is sensitive in any way
a
thx. So it has no labels on it. Do you happen to have any other Prefect agent connected to your Prefect Cloud account?
f
That's the only one
And looking at the agent view, we see that Prefect Cloud thinks the runs should be scheduled on that agent too.
a
@Filip Lindvall did you find out the reason for your late runs in the end? I am experiencing the same issue and looking at this thread for inspiration. On @Anna Geller suggestion to inspect the logs on the agent pod, how can I do that? Prefect Cloud UI or CLI? Wondering if elevated admin access in the Team is required before I can do so.
a
Your kubeconfig would need to point to the cluster your agent is running on. Then, you can get pods using
kubectl get pods
and to see the logs, you do:
kubectl logs pod_name
👍 1