Gary07/16/2020, 3:24 AM
Is it okay for Prefect to generate about 10,000~50,000 mapped tasks within the above flow without any problem? Another scenario is to generate about 1 million tasks within a flow. In this scenario, we query user id list from our database, and perform some calculation about user behavior analysis, (One user is mapped to one mapped task.) Is it okay? Or is there a better way to do this?
@task def get_stock_id_list(): # query stock id list from database stock_id_list = query_stock_id_list_from_db() return stock_id_list @task def crawl_stock_data_from_external_website(stock_id): # Crawler related work return crawl_stock_data(stock_id) @task def perform_calculation(crawled_data): # Some calculation with Pandas perform_some_calculation(crawled_data) with Flow('Example flow') as flow: # The number of stock id in list is about ten thousand. stock_id_list = get_stock_id_list() crawled_data_list = crawl_stock_data_from_external_website.map(stock_id_list) perform_calculation.map(crawled_data_list)
Chris White07/16/2020, 3:33 AM
Jackson Maxfield Brown07/16/2020, 3:47 AM
Gary07/16/2020, 4:12 AM
we usually recommend people limit the number of mapped tasks at any given level to ~10,000, both for performance reasons and also at that scale it’s not clear that prefect is providing much visibility."At any given level to ~10,000" Say, if I have a flow as follows:
Did you mean the sum of all mapped tasks within this flow is 10k, 1k for each job function? Or 10k for each job function (total 100k mapped task for this flow?
with Flow("another flow") as flow: ids = get_stock_id_list() job_1_result = job_1.map(ids) job_2_result = job_2.map(ids) . . . job_10_result = job_10.map(ids)
the open source prefect server would fall over at this scale, so if you want the UI I would highly recommend using prefect cloud for this scaleYes, we want the UI. Could you provide more info about why open source prefect server would fall over at this scale?
nicholas07/16/2020, 4:27 AM
Jackson Maxfield Brown07/16/2020, 4:32 AM
are done one by one now instead of simply passing all inputs to
under the hood which is what causes a lot of problems
Gary07/16/2020, 4:43 AM
Could you elaborate on why you want this many mapped tasks? While prefect can support incredibly granular tasks, it’s usually worthwhile to take a step back and ask whether this level of granularity is actually providing you value, vs. batching users into fewer groups (on the order of thousands)For the first scenario, currently we want to process U.S. stock data with Prefect. The number of U.S. stock we have to deal with is fewer than 8,000. So the recommended mapped task number is okay for us. However, in the future we will not only focus on US stock market. We will eventually process stocks in different country. ex: Europe, China, Hong Kong, ..., etc. Moreover, tasks for a stock could unexpected fail sometimes due to many kinds of issues (e.g., crawler fails by target webpage throttled, by stability issue with proxy service, ...). Therefore, if Prefect can work under many flows, for each flow with a few tens of thousand mapped tasks, will be really helpful for us. For the latter scenario, batching process user data is okay for us. Just ask for curiosity. 🙂
Chris White07/16/2020, 4:06 PM
, so your overall flow may have many more tasks than that. Also, we recently refactored our mapping pipeline and so it’s likely much more efficient now. Lastly
if Prefect can work under many flows, for each flow with a few tens of thousand mapped tasksPrefect Cloud can absolutely handle this load, so no worries there!
Gary07/19/2020, 2:46 AM