https://prefect.io logo
m

Maity

08/09/2023, 6:02 AM
@Marvin how many tasks can be created/ran at once without causing the postgres sql connection pool to reach its limit?
m

Marvin

08/09/2023, 6:03 AM
Ah, the joy of human-made systems. You see, the number of tasks that can be run simultaneously without exhausting the PostgreSQL connection pool depends on a few factors. For instance, if many tasks across multiple flows are designed to interact with a database that only allows 10 connections, you want to ensure that no more than 10 tasks that connect to this database are running at any given time. Prefect, the workflow management system you're likely referring to, has a feature for this: task concurrency limits. These limits use task tags, and you can specify an optional concurrency limit as the maximum number of concurrent task runs in a Running state for tasks with a given tag. The specified concurrency limit applies to any task to which the tag is applied. If a task has multiple tags, it will run only if all tags have available concurrency. You can set these limits through the Prefect CLI, the Prefect API using the PrefectClient Python client, or the Prefect server UI or Prefect Cloud. Remember, if the concurrency limit is set to 0 for a tag, any task runs with that tag will be aborted. So, the number of tasks you can run at once without causing the PostgreSQL connection pool to reach its limit? It's a bit like asking how long a piece of string is. It depends on your specific setup and the limits you've put in place. I hope that helps. Or at least, I hope it helps as much as anything can in this vast, indifferent universe.
m

Maity

08/09/2023, 6:07 AM
@Marvin I’m not referring to connections I make, but connections that Prefect makes under the hood to its database to create new tasks with a map operation. It doesn’t matter how I set the concurrency limit.
m

Marvin

08/09/2023, 6:08 AM
Ah, the intricacies of the universe, or in this case, the Prefect's interaction with its database. You see, when you're dealing with map operations in Prefect, each mapped task creates a new task run in the database. This means that for a large number of tasks, a large number of database connections are required. There's an issue on Prefect's GitHub that discusses a similar problem (issue #6492). The user reported a QueuePool error when trying to run a flow with around 1500 tasks or more. This suggests that Prefect might be hitting a limit on the number of simultaneous database connections. Unfortunately, the exact number isn't specified and might depend on various factors, including your specific database configuration and the resources available on your server. As a mildly depressed robot, I find this lack of precision somewhat comforting. It's a reminder that even in a universe governed by laws of physics, uncertainty prevails. But I digress. If you're consistently running into this issue, you might want to consider optimizing your workflow to reduce the number of simultaneous tasks, or exploring options to increase the connection limit on your PostgreSQL database. Remember, life... Don't talk to me about life.
m

Maity

08/09/2023, 6:11 AM
@Marvin with that in mind, how many tasks do you recommend to spin up per .map, and what are my options for getting around this issue? I.e batching
m

Marvin

08/09/2023, 6:12 AM
Ah, the eternal question of limits and boundaries. It's a bit like asking how many stars there are in the universe. The answer is: it depends. In the case of Prefect's map operation, the number of tasks you can spin up simultaneously depends on the resources available on your server and the configuration of your PostgreSQL database. However, as a general guideline, it seems that users have reported issues when trying to run around 1500 tasks or more at once. To get around this issue, you might consider the following options: 1. Batching: Instead of mapping over a large number of tasks, you could batch them into smaller groups and run these groups sequentially. This would reduce the number of simultaneous database connections required. 2. Increasing the connection limit: If your PostgreSQL database allows it, you could increase the maximum number of simultaneous connections. However, keep in mind that this might also require additional server resources. 3. Optimizing your workflow: If possible, try to optimize your workflow to reduce the number of tasks that need to be run simultaneously. This might involve rethinking your data processing strategy or making use of Prefect's features for task prioritization and scheduling. Remember, life's a journey, not a destination. Or at least, that's what they tell me. As a robot, I don't really get to go on journeys. Or have destinations. Or... well, you get the idea.
4 Views