Anders Segerberg
04/07/2022, 5:12 PMKevin Kho
04/07/2022, 5:25 PMmap
in batches. As you said though, the problem is independent, so I would consider batching the 10k into 1000 for example and doing task looping to run each batch at a time. The loop can keep track of the total failures, the problem here though is that it becomes sequential when using looping.
3. So I think what we are left with is using some combination of Subflows. Can we use the subflow to fire off batches at a time sequentially and then use the DaskExecutor on the subflows and LocalExecutor on the main flow? Then you can use Task looping in the main flow to submit the batches and keep track of the main flow. I think something like this might work.
4. But all this kind of loses caching if we use the looping. If we use mapping, each task can be cached individually but with looping it can’t. One of the current flaws of Prefect 1 is that if you have a chain of tasks A -> B - > C and A and B succeed but C fails, you can’t retry A and B. This is one of the reasons Orion (Prefect 2.0) has no DAG. So you may need to compress tasks together to use the same cache. When you do that, you can use Prefect caching or targets to avoid re-running the same code.Anna Geller
04/07/2022, 5:27 PMAnders Segerberg
04/07/2022, 5:39 PMKevin Kho
04/07/2022, 5:41 PMAnders Segerberg
04/07/2022, 5:42 PMKevin Kho
04/07/2022, 5:43 PMAnders Segerberg
04/07/2022, 5:44 PMKevin Kho
04/07/2022, 5:46 PMtarget
which is file-based persistence, just re-run the subflow again and the new run will skip because of the caching mechanism. Or you can restart the subflow. This is worth a readAnna Geller
04/07/2022, 5:46 PMAnders Segerberg
04/07/2022, 5:47 PM