Hi 👋
A question about the “philosophy” of Prefect.
I’m really trying to understand the way of doing things here, regarding the number of Tasks you should have in a flow.
So for context: Your mission is to write a job that downloads about 300,000 files from S3 and do some processing on each file. This should be parallelized to some degree to make it finish in reasonable time.
So reading about Airflow and accumulated best practices over the web, Airflow should not do the work itself. Instead tell others what to do, and monitor. So you would have an Airflow job with a single Task, that run a single python script on some remote compute (let’s say ECS). The python script will use techniques like asynio.gather() to handle parallelism and process the files quickly.
would you say the same philosophy applies to Prefect?
Or, since Prefect is more “advanced” and “capable”, I could actually have thousands of Tasks running concurrently to achieve the parallelism, instead of “hiding” it in a single script using something like asyncio.gather()?
Thanks!