< Marvin> If I am going to precess every file on a bucket Wh Prefect Community #ask-marvin

<@ULVA73B9P> If I am going to precess every file o...

dan

11/17/2023, 2:40 AM

@Marvin If I am going to precess every file on a bucket, Which one is better? • We define a flow (e.g., process_bucket()) for the whole bucker, and a task (e.g., process(object)) for each file. • We define a parent flow that iterates over all object in bucket, and a sub-flow to process each respective file.

Marvin

11/17/2023, 2:44 AM

Both approaches have their own merits and can be used based on the specific requirements of your use case. 1. Defining a flow for the whole bucket and a task for each file: This approach is simpler and easier to implement. Each task is independent and can be executed in parallel, which can significantly speed up the processing time if you have a large number of files. However, this approach might not be suitable if the processing of one file depends on the results of another file. 2. Defining a parent flow that iterates over all objects in a bucket, and a sub-flow to process each respective file: This approach provides more flexibility and control over the execution process. You can manage dependencies between files, handle errors more effectively, and even control the execution order of the files. However, this approach is more complex and might require more resources. In Prefect 2, subflows are just flows called from within a flow. You just call flows now instead of my_flow.run(). This makes the second approach more feasible and efficient in Prefect 2. Ultimately, the choice between these two approaches depends on the specific requirements of your use case. If you need more control and flexibility, go with the second approach. If you want simplicity and speed, go with the first approach.

Open in Slack

Previous Next