https://prefect.io logo
Title
m

Miecio

08/21/2020, 1:13 PM
Hello everyone! I'm completely new to prefect (in fact I discover it couple days ago) and I wonder if my use case fits in what prefect can deliver. I need to run some kind of reporting pipeline working similarly to ETL schema. I need to extract ~1M of records from postgresql than for each of this record I need to query redis database, run some processing and save results to db. I manage to POC it as prefect flow but I have some questions. Does prefect support some concurrency for tasks similar to MapReduce? I can imagine that my data extracted from psql by extraction task can be splited and processed separately on multiple agents.
i

itay livni

08/21/2020, 1:25 PM
Hi @Miecio take a look at mapping and some of the utilities that are associated with map,
apply_map
and
flatten
. The simplest example is found here https://docs.prefect.io/core/concepts/mapping.html#reduce
🚀 1
m

Miecio

08/21/2020, 1:29 PM
hmm, I have seen and tested this solution but in such example I will have 1M of task, and on my local POC execution took much longer than single worker single task with loop inside. But if this is the way how You should run such flows I suppose I'll need to give it another try
n

nicholas

08/21/2020, 1:35 PM
Hi @Miecio - Prefect can certainly parallelize tasks! We make use of Dask to accomplish parallelism in whatever magnitude your infrastructure can support. I'd recommend taking a look at the Parallelism Within a Flow idiom and at Dask distributed for the specifics of creating and maintaining a Dask cluster.
m

Miecio

08/21/2020, 1:49 PM
thanks a lot, I'll take a look on Dask. What is Prefect way of handling tasks, should I use map and write task processing single record from my database (having 1M tasks for each step of my flow)? or maybe I should write task working on multiple records with
for
loop inside of task (like here)?
n

nicholas

08/21/2020, 1:52 PM
I think batching records will be your best bet, probably processing them in batches of 1,000 will make your mapped tasks more manageable.
m

Miecio

08/21/2020, 1:58 PM
cool, thanks for Your help 🙂 I'll try to rewrite my flow and we will see! BTW: I have read on the webpage that there is WIP for k8s deployment of prefect-server stack do You know where I can get more info?
n

nicholas

08/21/2020, 2:13 PM
I'd watch the Prefect Server repo 🙂
👍 1