<@ULVA73B9P> we have setup a couple of deployments...
# ask-marvin
c
@Marvin we have setup a couple of deployments in Prefect Cloud and are running load tests by submitting 5 flow runs each with 2000 sub-flow runs. We are using Prefect 3.x and have a Google Cloud Run V2 Pull Work Pool setup with 2 Workers. We noticed a number of 429s in the worker and also a number of hanging pending jobs when the flows were initially submitted. We currently are observing no logs in the Workers (other than the start logs). Are there additional settings or things we can do to dig deeper into what the Workers are doing? We see a large number of pending jobs and scheduled but no running sub flows. Is there any way to see in Prefect Cloud what the API is returning/usage?
m
Working with large numbers of sub-flows in Prefect can be quite an adventure, especially with the 429 errors and pending jobs you're seeing. Here are some things you might consider to dig deeper: 1. API Rate Limits: The 429 errors indicate that you're hitting a rate limit with the Prefect API. This can happen when too many requests are sent in a short time. While the exact rate limits for Prefect Cloud may not be publicly documented, you could reach out to Prefect support for guidance specific to your account plan. 2. Worker and Work Pool Configuration: Make sure your Google Cloud Run workers are properly configured to handle the load. The number of workers and their concurrency settings can impact how jobs are picked up and processed. Consider increasing the number of workers or their capacity. 3. Pending Jobs and Scheduled Flows: Jobs might remain pending if there are not enough worker resources to take them on. Reviewing the work pools and concurrency settings can be helpful. It's like trying to run a marathon with jelly on your shoes - not possible without the right prep! 4. Logging and Monitoring: Since you're not seeing enough logs, checking how logs are configured for your workers could provide insights. Ensure that the workers are set to output logs to a service or location that you have access to. 5. Upgrading and Managing Infrastructure: This guide on upgrading from agents to workers could offer useful insights, especially if you have previously used different setups. Unfortunately, there aren't details on a feature to directly view API responses and usage within Prefect Cloud's standard offerings, but monitoring the traffic and catching logs (possibly by enhancing log verbosity) can offer clues. Lastly, it's also worth checking the GitHub issues and updates for any community-contributed fixes or common pitfalls you might be encountering. If all else fails, reaching out to Prefect's support team with the specific details of your setup could be your next best step! They might be able to provide further proprietary insights on the API usag and limits.
c
@Marvin on our pull workers we set
PREFECT_LOGGING_LEVEL
and
PREFECT_INTERNAL_LOGGING_LEVEL
which showed additional logs that seemed helpful. Our workers ran
get_and_submit_flow_runs
and
Discovered 0 scheduled_flow_runs
despite seeing 20,000+ scheduled flow runs in the Prefect Cloud portal. Any ideas why workers would not be able to discover scheduled flow runs?
m
It seems you're encountering an intriguing situation where scheduled flow runs are not being picked up by your workers. Here are some potential reasons and suggestions based on the documentation and findings: 1. API Connectivity and Configuration: Make sure your workers have the correct API connectivity settings. This includes ensuring that the
PREFECT_API_URL
and
PREFECT_API_KEY
are properly configured. Double-checking these settings can often reveal misconfigurations that affect worker communication with Prefect Cloud. 2. Flow Run Filters: There could be filtering issues where workers are incorrectly filtering out the flow runs. For instance, they might be looking for flow runs in a specific state or time range that doesn't match with the actual runs. Ensure your querying or filtering logic is aligned with your scheduling requirements. 3. Concurrency and Setting Limits: Check your workers' concurrency limits and ensure that they are high enough to handle your scheduled flow runs. Prefect allows setting limits that might constrain how many runs are processed at any given time, as described in the work pool documentation. 4. Event Loop and Execution Environment: An issue was noted in some tests related to the asyncio event loop not being available, which affected flow run execution. This could indicate potential environmental or library issues that need investigation, especially with how workers instantiate and manage their async tasks. 5. Prefect Cloud Portal Discrepancies: Sometimes, the Prefect Cloud portal might show scheduled states but not reflect the underlying issues in run discovery or execution. Ensuring your monitoring and telemetry are capturing logs properly from your workers, as indicated by logging updates, could give clues. If your workers continue to "discover 0 scheduled_flow_runs", it might be beneficial to reach out to Prefect's support channel with this detailed context. They may offer more insights or potential configurations specific to the cloud environment or account level you are using! Or maybe your situation has become that unique bug that's waiting for a heroic GitHub post.