Hi :wave: A question about the “philosophy” of Pr...
# ask-community
y
Hi 👋 A question about the “philosophy” of Prefect. I’m really trying to understand the way of doing things here, regarding the number of Tasks you should have in a flow. So for context: Your mission is to write a job that downloads about 300,000 files from S3 and do some processing on each file. This should be parallelized to some degree to make it finish in reasonable time. So reading about Airflow and accumulated best practices over the web, Airflow should not do the work itself. Instead tell others what to do, and monitor. So you would have an Airflow job with a single Task, that run a single python script on some remote compute (let’s say ECS). The python script will use techniques like asynio.gather() to handle parallelism and process the files quickly. would you say the same philosophy applies to Prefect? Or, since Prefect is more “advanced” and “capable”, I could actually have thousands of Tasks running concurrently to achieve the parallelism, instead of “hiding” it in a single script using something like asyncio.gather()? Thanks!
m
Yes, you could theoretically have thousands of tasks running in parallel, assuming your compute supports it. With Prefect this is done by selecting your desired task runner. I'd suggest reading our Task Runner Docs. This varies slightly between Prefect versions, so ensure you're on the right version of the documentation.
👍 1