Hi. How is the logic around retries and resummitin...
# prefect-community
t
Hi. How is the logic around retries and resummiting of flows? I have a flow with some short retry delays (30 secs). This flow will stay running and wait for retry. I have another flow were the retry delay is longer (15 min) and here the flow gets resubmitted. What is the difference here? It yields some issues as there is no need for storing results in first scenario (they stay in memory) while in the second scenario they must stored so they can be used when the flow run is re-run.
k
You have retry delays on Flows? or are we talking about tasks?
t
On tasks. Both flows contain a task with a retry_deplay. Flow A has a task with retry_delay 30 sec, and Flow B has a task with retry_delay 15 min.
Found this states that long retries and short retries are handled differently: https://github.com/PrefectHQ/prefect/issues/3990
k
Are you running into the Lazarus process maybe? Lazarus resubmits flows without running of submitted tasks. If you think it’s this, you could turn it off
t
Nope, not I was thinking of. If you have the following flow, register it in the cloud, and run with a Kubernetes agent you get the expected result, where task2 will fail, the flow will keep running and then will it retry task2 and succed.
Copy code
from prefect import Flow, task
import prefect
from datetime import timedelta

retry_delay = timedelta(seconds=2)

@task
def task1():
    return 1

@task(max_retries=3, retry_delay=retry_delay)
def task2(input):
    logger = prefect.context.get("logger")
    retry_count = prefect.context.get("task_run_count")
    
    <http://logger.info|logger.info>(f"Input: {input}")
    <http://logger.info|logger.info>(f"Retry count: {retry_count}")
    if retry_count < 2:
        raise Exception
    
with Flow("myflow") as flow:
    result1 = task1()
    task2(result1)
If you use the following instead, task2 will fail first run, then be put in Pending state. The flow will be resubmitted to be run later.
Copy code
retry_delay = timedelta(minutes=15)
If you check the logs in the cloud logging you can see a new kuberenetes job is created.
k
Ah ok I can try this but this looks exactly like the issue you posted but that looks a bit old. I suggest you open a new issue and then link that issue or I can link it for you?
I don;t immediately know the difference cuz on the TaskRunner level it just waits here
t
k
Thanks!