<@ULVA73B9P> Currently I have some flows that gene...
# marvin-ai
j
@Marvin Currently I have some flows that generate tens of thousands of tasks. This causes the Prefect server (not my workpools) to crash. Currently I fix this by putting tasks on a queue, stopping task generation when the queue is full, and continuing when it's below a certain number --- but this is a lot of work. If I set REDIS_PREFECT_MESSAGING_HOST (and other related variables), does this mean Prefect uses Redis to do all the task management messaging? (i.e., will this obviate my having to manually queue things?)
m
thought for 58.7 seconds
Based on the research, I need to give you a somewhat disappointing answer (how terribly unsurprising). The
REDIS_PREFECT_MESSAGING_HOST
and related variables don't appear to be a complete solution to your problem of managing large numbers of tasks. Here's why: 1. While Redis can indeed be used as a messaging backend in some contexts, there isn't clear documentation or evidence that setting these environment variables will fundamentally change how Prefect handles task orchestration and management. 2. Your current approach of implementing a queue to control task generation, while admittedly cumbersome, is actually a sound architectural pattern for handling this kind of scale. Instead, let me suggest a few alternative approaches that might help: 1. Task Mapping: If your tasks are similar, consider using Prefect's task mapping functionality which is specifically optimized for handling large numbers of parallel tasks. This is documented in Beyond Loops: How Prefect's Task Mapping Scales to Thousands of Parallel Tasks. 2. Batching: Instead of creating thousands of individual tasks, consider batching related work into fewer, larger tasks. This reduces the orchestration overhead while still maintaining parallelism. 3. Subflows: Breaking your workflow into subflows can help manage the complexity and reduce the number of tasks that need to be orchestrated simultaneously. Would you be interested in exploring any of these alternative approaches? I could provide more specific guidance based on your use case. Also, it would be helpful to know: 1. Roughly how many tasks are we talking about? 2. Are these tasks similar in nature (which might make them candidates for mapping)? 3. Could you share a bit about the nature of these tasks and why so many are needed? This would help me provide more targeted suggestions for your specific scenario.
r
@Javier King I think we're trying to do something similar. Just to clarify, you're also running a self-hosted Prefect server? I'm trying to run it with more than 100K tasks and ran into various issues. What errors did you have / why did you server crash? Generally tasks perform better than subflows. Redis will only help with the event messaging aspect of updating the task runs & task states in the database and subsequently in the Prefect UI, so if your server is crashing that seems like a separate issue to me. Are you removing tasks from the queue once they are done executing i.e. waiting for their results, or what's the removal condition?
j
@Robyn H Ah! Huh, I'm just seeing this message (a month later lol) The problem was (is?) that Prefect overloads the DB when too many tasks are created simultaneously. Part of why using Redis helps is because a lot of the task-status-updating and other things that default to the DB are shifted over to Redis. Ultimately, if I'm understanding things correctly, I think this is a Prefect-problem — database connections need to be handled well (I've done all the appropriate configuring/upgrading/etc. of my DB)