Xing Zeng
03/06/2024, 9:21 PMXing Zeng
03/06/2024, 9:21 PM@task(
persist_result=True
retries=2,
retry_delay_seconds=10,
timeout_seconds=60 * 30
)
def process_batch():
# business logic
But we found that this approach still doesn't address our issue, the task can still take up to a coupe of hours without tring to fail on its own.
We do observed that in cases when pod gets removed, we can still manually pause the job, and then resume then, and that will bring in a new flow pod which will continue the running process on the last unfinished task. But
Do you have any suggestions on steps we could take to enable the flow run to recover automatically upon flow pod removal?Xing Zeng
03/06/2024, 9:24 PMMax Eggers
03/07/2024, 5:09 PM