Hi < Marvin> I m seeing this query is very frequent in my rd Prefect Community #ask-marvin

Hi <@ULVA73B9P> I'm seeing this query is very freq...

Idan Noti

12/09/2023, 2:01 PM

Hi @Marvin I'm seeing this query is very frequent in my rds instance:

Copy code

SELECT flow_run.deployment_id, flow_run.work_queue_name, flow_run.flow_version, flow_run.parameters, flow_run.idempotency_key, flow_run.context, flow_run.empirical_policy, flow_run.tags, flow_run.created_by, flow_run.infrastructure_pid, flow_run.auto_scheduled, flow_run.flow_id, flow_run.infrastructure_document_id, flow_run.parent_task_run_id, flow_run.state_id, flow_run.work_queue_id, flow_run.name, flow_run.state_type, flow_run.state_name, flow_run.state_timestamp, flow_run.run_count, flow_run

What is the source of this query, and can it cause problems? The larger the DB, seems like this query takes more resources.

Marvin

12/09/2023, 2:01 PM

The first query about the source of the

SELECT

query on the

flow_run

table is not directly answered in the provided excerpts. However, based on standard practices with databases, this sort of query would generally be generated by the application logic that needs to retrieve flow run information for various operational purposes, such as updating states, retrieving run details, or orchestrating tasks. As for the second query, while the excerpt does not specifically mention performance issues related to the

flow_run

table, it does highlight that under heavy workloads, especially with write operations, SQLite performance can suffer. It recommends using PostgreSQL, particularly when running many flows with high degrees of parallelism or concurrency. This implies that as database usage intensifies, which would include frequent

SELECT

queries on tables like

flow_run

, performance could degrade, and thus, it could lead to resource bottlenecks. These excerpts suggest that while there isn't a specific answer to the source of the query, it's likely part of Prefect's normal operation. Additionally, it suggests that frequent and heavy database queries can cause performance issues, particularly with SQLite, and PostgreSQL is recommended for production environments. If the database queries are slowing down your RDS instance and consuming more resources as your database grows, it would be advisable to review the database configuration for optimization and consider scaling up your database resources or optimizing query performance. If you are using SQLite, it might also be a good sign to migrate to a more robust database like PostgreSQL. Furthermore, monitoring the database with tools like slow query logs could help you identify and optimize the queries that are consuming the most resources.

Idan Noti

12/09/2023, 2:03 PM

Thanks @Marvin we are using postgresql

Marvin

12/09/2023, 2:04 PM

That's great to hear. If you're already using PostgreSQL with Prefect and you notice this query is frequent and resource-intensive, here are a few steps you might consider to address the performance: 1. Index Optimization: Ensure that your database tables, especially

flow_run

, are properly indexed. Indexes can significantly improve the performance of

SELECT

queries by reducing the amount of data the database needs to scan. 2. Query Analysis: Use EXPLAIN ANALYZE with the problematic query to understand how PostgreSQL is executing it. This can often provide insights into inefficiencies and potential optimizations. 3. Connection Pooling: If the issue is not just the query itself but the sheer number of connections and queries being made, consider implementing connection pooling, which can help manage the load on your database server. 4. Database Tuning: Adjust the configuration settings of your PostgreSQL instance to better fit your workload, such as work_mem, shared_buffers, and effective_cache_size. 5. Hardware Resources: If your server is under-resourced, consider upgrading the hardware or provisioning more resources to the database server (e.g., additional CPU, RAM, or faster storage). 6. Partitioning: If your

flow_run

table is very large, you could consider table partitioning strategies to split the data into smaller, more manageable pieces. 7. Database Maintenance: Regular maintenance tasks such as vacuuming and analyzing tables can help improve query performance by cleaning up dead tuples and updating statistics used by the query planner. 8. Caching: Implement caching strategies for frequently accessed data to reduce the total number of queries made to PostgreSQL. 9. Prefect Configuration: Look into Prefect's settings to see if there are ways to reduce the frequency or resource intensity of the query. This could include adjusting the intervals at which Prefect checks for new runs or state updates. Remember that database optimization can sometimes be a complex process and it's important to take a methodical approach to identify and resolve performance bottlenecks. If you continue to experience issues, consulting with a database performance specialist could provide additional insights and solutions tailored to your specific situation.

Idan Noti

12/09/2023, 2:05 PM

Thanks @Marvin How do I check or enable indexing on my flow_run table?

Marvin

12/09/2023, 2:06 PM

It appears that the provided excerpts don't directly address the question regarding how to check or enable indexing on your

flow_run

table in a PostgreSQL database. However, I can provide general advice on handling indexes in PostgreSQL: To check for existing indexes on a PostgreSQL table like

flow_run

, you can use the following SQL query:

Copy code

sql
SELECT
    indexname,
    indexdef
FROM
    pg_indexes
WHERE
    tablename = 'flow_run';

Running this query will list all indexes associated with the

flow_run

table, including the columns they index. To create an index on a table in PostgreSQL, you can use the

CREATE INDEX

SQL command. Here's an example that creates an index on the

id

column of the

flow_run

table:

Copy code

sql
CREATE INDEX flow_run_id_idx ON flow_run (id);

Replace

id

with the column or columns you identify as frequently accessed or used in JOIN operations, WHERE clauses, ORDER BY, etc., that could benefit from indexing. Before creating any new indexes, it's important to evaluate the read and write patterns of your database. While indexes can greatly speed up read operations, they can also slow down writes because the index must be updated with each insert or update operation. It's recommended to analyze your specific workload and create indexes that provide the best balance for your use case. For detailed and precise adjustments, consulting with a database administrator or a specialist would be the best course of action, as they can provide recommendations tailored to your specific database schema and usage patterns.

Idan Noti

12/09/2023, 2:06 PM

@Marvin isn't prefect supposed to create those indexes?

Marvin

12/09/2023, 2:07 PM

The search returned an issue related to a QueuePool error with a large number of task runs but did not directly answer the question regarding Prefect automatically creating indexes on the

flow_run

table. However, in typical application design, especially for database-oriented applications like Prefect, it is standard for the application to manage the database schema, including the creation of any required indexes, as part of its initial setup or migrations. This often happens automatically when setting up Prefect for the first time or when running database migrations after updates. If you believe the necessary indexes have not been created or are curious about the specifics of Prefect's database schema, you can review Prefect's database migrations to understand how the schema, including indexes, is managed. Additionally, indexes may not have been created due to some error during the initial setup or migration process. In such a case, referring to the Prefect documentation for database migrations, running required migrations, or contacting Prefect support may be necessary to ensure your database is properly configured. If the explanation here does not solve your issue, you might consider reaching out to Prefect's support or community for further assistance, as they may have more context on how Prefect handles database indexing and migrations.

Nate

12/09/2023, 2:50 PM

isn't prefect supposed to create those indexes?

the indexes we use can be found here if you're curious, are you encountering problems related to these flow run queries or just curious?

Idan Noti

12/09/2023, 3:09 PM

Hi @Nate Thanks for reaching out. I'm not just curious. We seem to have problems with scaling. We have a working setup with 10s of flows and more 10s of subflows working in production, but when scaling to 100s of flows we get timeouts from servers to the DB. We tried using a more robust DB in AWS RDS, but this also fails to hold this load. What we are seeing is a large CPU load on DB over time. This happens even if we use more and more powerful DB instanece When looking at TOP Queries in RDS monitoring I see these queries:

Idan Noti

12/09/2023, 3:09 PM

I want to understand if this query load is expected from prefect

Nate

12/09/2023, 3:13 PM

there's a lot that goes into this question

query load expected from prefect

that depends highly on your specific usage of prefect. can you say more about your deployment strategy / flow design? I might be able to make a recommendation or two, but again, it really does depend on a lot of things

Idan Noti

12/09/2023, 3:14 PM

Got it. Thanks. I will gather more details from my team, so I can be more specific.

👍 1

Idan Noti

12/09/2023, 3:31 PM

OK, some more details: To simulate the load we call the following 100 times in parallel simulating 100 clients load: We are using ConcurrentTaskRunner 1. Start Flow 2. Start 6 sub flows 3. Each sub flow calls 15 tasks 4. Each Task have 5 retries after 5 seconds each, so it can accumulate to 450 tasks eventually. Does this shed some light?

Idan Noti

12/09/2023, 3:33 PM

CC: @Lior Barak

👀 1

Idan Noti

12/09/2023, 4:06 PM

I corrected the numbers 👆

Idan Noti

12/09/2023, 4:06 PM

Is a retry operation counts as an additional task?

Nate

12/09/2023, 6:45 PM

We are using ConcurrentTaskRunner

1. Start Flow

2. Start 6 sub flows

3. Each sub flow calls 15 tasks

4. Each Task have 5 retries after 5 seconds each, so it can accumulate to 450 tasks eventually.

this sounds mostly reasonable to me, if you're running into issues, it may just be a matter of scaling your database

Idan Noti

12/09/2023, 8:35 PM

What does scaling the database mean? What would be a good rds setup for this? We tried several db scale and all eventually got overloaded.

Nate

12/10/2023, 4:54 AM

can you explain what sort of infra you have your db on? i saw you mentioned AWS RDS, but is that where you're running when encountering problems or was that a temporary thing? > all eventually got overloaded and how you're measuring this? depending on how you're running postgres, scaling could mean a couple of different things

Idan Noti

12/10/2023, 6:54 AM

We have a single instance with multi availability zone of Postgresql. We started with a simple m6i.large and scaled to m6i.xlarge. (We also had a try with 2xlarge, which we still need to verify the result of this experiment.) It is not a temporary thing. We aim to use it to ensure high availability of the DB. Measurement is done via AWS monitoring. Here is an example of how the CPU usage looks like for one of the experiments:

Idan Noti

12/10/2023, 6:55 AM

As you can see we are getting really close to 100% utilization.

Open in Slack

Previous Next