<@ULVA73B9P> My server instances is spawning MaxDe...
# ask-marvin
p
@Marvin My server instances is spawning MaxDepthExceeded errors. Is there anything I can do to fix this? raise MaxDepthExceeded(event) prefect.server.events.ordering.MaxDepthExceeded: occurred=DateTime(2025, 1, 16, 9, 35, 12, 838892, tzinfo=Timezone('UTC')) event='prefect.flow-run.Cancelling' resource=Resource(root={'prefect.state-name': 'Cancelling', 'prefect.state-type': 'CANCELLING', 'prefect.resource.id': 'prefect.flow-run.00f0e134-8e9d-4b3d-9bf1-d22a93936ffb', 'prefect.resource.name': 'handsome-malkoha', 'prefect.state-message': 'The parent flow run was cancelled.', 'prefect.state-timestamp': '2025-01-16T093512.838892+00:00'}) related=[RelatedResource(root={'prefect.resource.id': 'prefect.flow.283a6c16-6988-4e4e-a336-a6ce359c4571', 'prefect.resource.name': 'rfd-pipeline', 'prefect.resource.role': 'flow'}), RelatedResource(root={'prefect.resource.id': 'prefect.deployment.5a965e91-087d-4c8c-8871-0455fb222f0c', 'prefect.resource.name': 'serve', 'prefect.resource.role': 'deployment'}), RelatedResource(root={'prefect.resource.id': 'prefect.task-run.9d8b5cda-ab33-4c4b-bfb5-c84983f73b17', 'prefect.resource.name': 'nifty-flamingo', 'prefect.resource.role': 'task-run'}), RelatedResource(root={'prefect.resource.id': 'prefect.tag.serve', 'prefect.resource.role': 'tag'})] payload={'intended': {'to': 'CANCELLING', 'from': 'CANCELLING'}, 'initial_state': {'name': 'Cancelling', 'type': 'CANCELLING', 'message': 'The parent flow run was cancelled.'}, 'validated_state': {'name': 'Cancelling', 'type': 'CANCELLING', 'message': 'The parent flow run was cancelled.'}} id=UUID('1d4eb2f8-1171-4912-a271-8319f327beec') follows=UUID('ab5ad40f-11c1-421a-896a-048c436b4f3a') received=DateTime(2025, 1, 16, 9, 35, 12, 959059, tzinfo=Timezone('UTC'))
m
Your message was too long by 288 tokens...
p
@Marvin my server instance is spawning MaxDepthExceeded exceptions. raise MaxDepthExceeded(event) prefect.server.events.ordering.MaxDepthExceeded: occurred=DateTime(2025, 1, 16, 9, 35, 12, 838892, tzinfo=Timezone('UTC')) event='prefect.flow-run.Cancelling' resource=Resource(root={'prefect.state-name': 'Cancelling', 'prefect.state-type': 'CANCELLING', 'prefect.resource.id': 'prefect.flow-run.00f0e134-8e9d-4b3d-9bf1-d22a93936ffb', 'prefect.resource.name': 'handsome-malkoha', 'prefect.state-message': 'The parent flow run was cancelled.', 'prefect.state-timestamp': '2025-01-16T093512.838892+00:00'}) related=[RelatedResource(root={'prefect.resource.id': 'prefect.flow.283a6c16-6988-4e4e-a336-a6ce359c4571', 'prefect.resource.name': 'rfd-pipeline', 'prefect.resource.role': 'flow'}), ...
m
The
MaxDepthExceeded
exception you're encountering is likely related to the Prefect server's handling of events and their associated resources. This exception can occur when the number of related resources for an event exceeds a predefined limit. Here are some details and potential solutions: 1. Issue Context: There is a known issue with Prefect deployments where a race condition can cause sub-deployments to be stuck in a "Cancelling" state when the parent deployment is cancelled. This might be related to your problem if you're using sub-deployments. More details can be found in this GitHub issue. 2. Configuration Limits: The Prefect server has configurable limits on the number of related resources an event can have. The default maximum is 500 related resources. You might be hitting this limit, causing the
MaxDepthExceeded
exception. You can adjust this limit using the
PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES
environment variable. More configuration options are detailed in the Prefect settings reference. 3. Possible Solutions: - Increase the Limit: If your use case involves a large number of related resources, consider increasing the
maximum_related_resources
setting. - Review Deployment Logic: If you are using sub-deployments, review your deployment logic to ensure that cancellations and state transitions are handled gracefully. By adjusting these configurations and reviewing your deployment logic, you may be able to resolve the
MaxDepthExceeded
exceptions. If the problem persists, consider reaching out to the Prefect community or opening a support ticket for further assistance.
p
@Marvin I deleted the flows that were in "Canceling" state, but the events keep coming back. Can I also delete the events somehow?
m
To manage and potentially delete events from your Prefect server, you can consider the following options based on the available documentation: 1. Cancellation Cleanup Service: Prefect offers a cancellation cleanup service that can help manage non-terminal tasks and subflows. You can ensure this service is enabled and adjust its settings to help manage lingering events. The relevant environment variables are: - `PREFECT_SERVER_SERVICES_CANCELLATION_CLEANUP_ENABLED`: Enable or disable the service. - `PREFECT_SERVER_SERVICES_CANCELLATION_CLEANUP_LOOP_SECONDS`: Set the frequency at which the service checks for tasks to clean up. 2. Event Persister Settings: There might be settings related to event persistence, but specific deletion commands or configurations aren't detailed in the available excerpts. You may want to explore the server's database directly if you need to manually delete events, though this approach should be taken with caution. 3. Configuration Documentation: For more detailed configuration options, you can refer to the Prefect settings reference. Unfortunately, direct deletion of events through Prefect's high-level API isn't clearly documented. You might need to interact with the database or consider reaching out to the Prefect community for more specific guidance on managing event persistence.
p
@Marvin could this actually be caused by the cleanup service?
m
Yes, the Cancellation Cleanup service could potentially be contributing to the issue you're experiencing. Here's what I've found: 1. High CPU Usage: There have been reports of the Cancellation Cleanup service causing high CPU usage, particularly in environments with a large number of subflows. This can lead to performance degradation and potentially cause unexpected behavior, such as events reappearing. The service was observed to loop through all subflows, adding significant load to the system. GitHub Issue. 2. Cascading Cancellations: The service is designed to manage cancellations across subflows and sub-deployments. However, there have been discussions about improving its behavior to ensure that cancellations cascade properly and efficiently. This includes marking subflow runs as "CANCELLING" to enable proper infrastructure teardown. GitHub Pull Request. If you suspect the Cancellation Cleanup service is causing issues, you might consider temporarily disabling it to see if the problem is alleviated. This can be done using the environment variable
PREFECT_SERVER_SERVICES_CANCELLATION_CLEANUP_ENABLED
. For further insights, you might want to monitor your server's resource usage and review how the cleanup service interacts with your flow runs. If the problem persists, consider reaching out to the Prefect community or support for more tailored advice.