@Benoit Chabord Some general thoughts about the described workflow, not specific to Prefect.
The architecture you described is somewhat inefficient, as you’re required to continually poll against the server. To overcome this, you could swap to an event-based processing, where your workflow is only triggered once you know the required conditions have been met (In this case that there is files for download).
Depending on the specifics of how the files are generated and have to be processed you could make some of the following changes:
• If files are not required to all be processed together, use S3 event notifications with a lambda. This lambda can then invoke your prefect job with parameters based on the event
• If all the files are required to be processed together, you need a way of addressing when all the files are present. This can take a few forms:
◦ Publish a “Done” or .SUCCCESS file, which is what triggers the event notification. This is done within the hadoop ecosystem as a way of signifying completion of a multi-part write.
◦ (Inefficient) have your lambda get triggered on every file upload, and use logic to determine when the run condition has been met. This is better than polling, as if there is large gaps in your uploads, you aren’t wasting resources checking every 5 mins