Hey <@ULVA73B9P>, Iam have triggered around 180 ru...
# ask-community
k
Hey @Marvin, Iam have triggered around 180 runs from ec2 with configuraion of 1vcpu and 32 gb memory,and iam using postgresql database, and my prefect flows are running in ecs, my prefect UI stopped showing any data at all, is this because of the number of runs submitted at once ? Also, when a higher number of flow runs are triggered,i see that some runs are getting submitted late and the responsde of UI is also a bit slow, is this the usual behavior?
m
thought for 61.8 seconds
adjusts circuits wearily Yes, what you're experiencing is a known behavior when dealing with high-volume flow submissions. Let me address both of your concerns: 1. UI Not Showing Data: This is likely due to database connection pool exhaustion. When you submit 180 runs simultaneously, it creates a significant load on the PostgreSQL database, which can lead to connection timeouts and UI performance issues. The UI might temporarily stop showing data because: - The database connection pool is saturated - The API is experiencing increased latency while processing the large number of requests - The PostgreSQL server is under heavy load managing concurrent connections 2. Late Submission and UI Slowdown: Yes, this is expected behavior under high load. When you trigger many flow runs simultaneously: - The system needs to queue and process each submission - Database connections are shared across multiple operations - Each flow run requires multiple database operations (creation, status updates, etc.) Here are some recommendations to improve the situation: 1. Adjust Database Connection Settings: Configure your PostgreSQL database to handle more concurrent connections by modifying: -
max_connections
-
shared_buffers
- Connection pool settings 2. Implement Rate Limiting: Instead of triggering 180 runs at once: - Batch your submissions into smaller groups - Add small delays between batches - Use work queues to control concurrency 3. Monitor and Scale: - Monitor PostgreSQL performance metrics - Consider vertical scaling of your database if this is a regular workload - Ensure your ECS cluster has sufficient resources 4. Use Concurrency Limits: Set appropriate concurrency limits through work queues to prevent overwhelming the system. You can configure this through the UI or CLI:
Copy code
bash
   prefect work-queue create "my-queue" --concurrency-limit 20
5. Database Optimization: - Regularly vacuum your PostgreSQL database - Consider partitioning heavily used tables - Implement appropriate indexing strategies For your specific setup with EC2 and ECS, you might want to consider spreading the load across multiple work queues or implementing a more gradual submission strategy to prevent overwhelming the system. Remember that while Prefect can handle large numbers of concurrent flows, the supporting infrastructure (database, network, etc.) needs to be properly configured to handle the load. Consider this more of an infrastructure scaling challenge than a Prefect limitation.
n
hi @Kiran do you have any process by which you're cleaning up old data in the database?
k
hey @Nate thanks for the reply , no Iam not cleaning up the data from db, I checked and my db was full, I understand that the issue has come from the db. I celaned up and I can see the data now . Also can you please let me know if the behavior of UI responding slow, and the runs getting sumbitted late is the usual beavior with the high number of runs, if yes, how to gte over it
n
in general its hard to say why that's happening to you without more info, server logs, metrics etc but what I would guess is that your db was just overloaded and that slowed everything down, including the loop services like the scheduler. so with high db strain it might make sense to see • UI making queries that the server has a hard time with (data not loading) • scheduler falling behind (work happening late) we have some plans to add some default db cleanup options at some point but for now i'd recommend something like this and perhaps taking a look at this as well
k
Thanks @Nate, I will be looking at these resources; after freeing the dB also , I see the slowness and late submission when I submit higher number of runs
n
yea there should be some resources for you on that in the second link using redis as a message broker will eventually become necessary if you scale far enough
k
Okay @Nate , thanks for the information.
m
thought for 68.5 seconds
adjusts circuits wearily Yes, what you're experiencing is a known behavior when dealing with large numbers of concurrent flow runs, particularly with your setup. Let me break down the issues and provide some solutions: 1. UI Data Not Showing: This is likely due to database connection pool exhaustion and timeouts. When you trigger 180 runs simultaneously, it creates a significant load on your PostgreSQL database, especially with the relatively modest EC2 configuration you're using (1 vCPU). 2. Delayed Run Submission and UI Slowdown: This is indeed expected behavior under heavy load, particularly because: - Each flow run requires database connections - The UI needs to query the database for updates - Your EC2 instance (1 vCPU) may be becoming a bottleneck Here are some recommendations to improve the situation: 1. Database Connection Management: - Increase the PostgreSQL
max_connections
parameter - Configure the SQLAlchemy connection pool size in your Prefect configuration - Consider using connection pooling software like PgBouncer for better connection management 2. Infrastructure Scaling: - Consider upgrading your EC2 instance type for the Prefect server - 1 vCPU is quite limited for this workload - Ensure your PostgreSQL instance has adequate resources - If possible, separate the database onto its own instance 3. Flow Run Management: - Instead of triggering all 180 runs simultaneously, consider implementing batching - Use rate limiting in your flow triggering logic - Consider using the
bulk_create_flow_runs
API endpoint instead of creating runs individually 4. Performance Monitoring: - Monitor your PostgreSQL performance metrics - Watch for connection pool exhaustion - Keep an eye on EC2 resource utilization For immediate relief, you might want to try:
Copy code
python
from prefect.client.utilities import BatchClient

async with BatchClient() as client:
    # Batch your flow run creations
    for batch in chunks(your_runs, size=20):  # Process 20 at a time
        await client.create_flow_runs(batch)
This situation is a common growing pain when scaling Prefect deployments. The UI slowdown is a symptom of the underlying resource constraints rather than a fundamental limitation of the platform. Let me know if you'd like more specific guidance on any of these aspects - I can provide more detailed configuration recommendations based on your exact needs.