hey all, i'm looking at the server logs and i'm ge...
# ask-community
l
hey all, i'm looking at the server logs and i'm getting a ton of warnings:
Copy code
2023-12-07T15:19:54+02:00 13:19:54.822 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 333.405324 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:24:52+02:00 13:24:52.771 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 297.947375 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:29:54+02:00 13:29:54.353 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 301.581939 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:35:23+02:00 13:35:23.989 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 329.635069 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:41:26+02:00 13:41:26.275 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 362.284819 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:46:40+02:00 13:46:40.591 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 314.314349 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:52:41+02:00 13:52:41.746 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 361.154281 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T15:59:18+02:00 13:59:18.214 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 396.467137 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:04:05+02:00 14:04:05.897 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 287.68302 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:09:54+02:00 14:09:54.378 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 348.479282 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:16:08+02:00 14:16:08.516 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 374.137448 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:22:20+02:00 14:22:20.136 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 371.61939 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:27:01+02:00 14:27:01.589 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 281.451967 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:32:15+02:00 14:32:15.239 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 313.649418 seconds to run, which is longer than its loop interval of 20.0 seconds.
2023-12-07T16:37:58+02:00 14:37:58.483 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 343.243348 seconds to run, which is longer than its loop interval of 20.0 seconds.
the servers are unders stress sure but this seems a little excessive i'm self hosting on k8s, and these errors happen with a local or remote (AWS RDS) DB should I be worried?
j
Hey, nothing to worry about. There are many loop services that run as part of Prefect Server. These run an interval, say every 15 seconds. If the actual work takes > 15 seconds, you'll get this warning. You can modify the loop interval seconds with
PREFECT_API_SERVICES_CANCELLATION_CLEANUP_LOOP_SECONDS
for this particular service. If you have a lot of data in your db it's likely you'll encounter this and it's appropriate to increase the interval
l
tnks, but what does this loop service do? (
CancellationCleanup
) because i'm not cancelling anything really. if i'm running multiple servers to handle load should I raise the loop seconds on all these loop services?
j
It helps with cancellation of tasks and subflows, when a flow is cancelled. If you're seeing the warning for other services you can raise the interval to avoid the warning message, but they're configured for best out of the box experience. If the interval for some loop services get too large things like scheduling for example, may not work quite as quickly.
You can also try disabling the loop services on your web servers and running them as separate deployments
upvote 1
Usually if you're seeing these error it points to flow run volume in your db. You might want to look to schedule a flow run that cleans up older flow runs.
l
i'm cleaning old flow runs my current situations is about 5000 flow runs a day each of those has 10 sub-flows each has 10 tasks (rough numbers) and it seems that the servers are not really handling this stress well
j
Do you have a more specific idea of what that means? Is is too many requests? Is the database slow at making queries? Do things have enough resources etc? Without more to go on, the next thing I'd try for improved performance is running the loop services individually separate from the web server. scaling the OSS server can definitely be pretty dependent on your workload
l
hmmm not sure what that means just yet. queries are slow for sure and I see all these warnings even timeout errors on queries in the server logs I'll investigate a bit more thanks