Preston Marshall

03/31/2020, 2:28 PM
What is an unreasonable number of tasks to be mapping over? If I have a table with millions of rows and need to transform each one of them (for example), can I generate a task for each row? Or would I need to export "pages" and process each of these in a task?

Alex Cano

03/31/2020, 2:46 PM
I think this part is both art and science. From what I’ve seen while using Prefect, you want to do things in a way that leverage the caching and task states effectively. My suggestion is if you’re able to do it one, do it. For example, if you’re pulling the full table out of a database, and looking for new records based on a timestamp column, you should do that in a batch instead of mapping over the result set and doing a simple filter. Also, there is an overhead for submitting a bunch of tasks, especially once you leave the low thousands range. I don’t know the exact number where it starts to become unreasonable, but my guess is it’s related to how much processing power you give the scheduler. Just my 2 cents and I’m sure others have more details!
upvote 4

Preston Marshall

03/31/2020, 4:32 PM
Sounds right, thanks