Question about retries If I have a sequence of independent t Prefect Community #ask-community

Question about retries If I have a sequence of in...

07/29/2021, 11:24 PM

Question about retries If I have a sequence of independent tasks, is it possible run them, and then rerun only those that failed ? but without waiting for the task that failed to rerun. only at the end of all the tasks get back to those that failed ?

Kevin Kho

07/30/2021, 1:22 AM

From my understanding, you want to defer retries to the end. What is the use case you are retrying? To answer though, this can’t be done at the moment easily and not sure in what case this makes sense. The easiest way to kinda do this is have no retries, and then restart the flow from failure.

07/30/2021, 6:37 PM

I have a flow that runs several independent ETLs, collecting and inserting data to some feature data-store tables in hadoop impala. some ETLs may run much faster than others, so if there is an issue with one of those that runs longer, I prefer the other ETLs not to have to wait for it to retry. another situation can be that one of those ETL have error, and again I prefer not to have to retry several times before running the other ETLs I do not want to run all the ETLs in parallel, since I do not want to overwhelm our cluster

07/30/2021, 6:41 PM

somewhat related to this... is it possible to rerun a flow if it runs for more than some amount of time ? this is for dealing with jobs that get hang. It happens to us on occasion on the Oozie ETLs and our ops typically kill the hang job, and allow the next one to run

Kevin Kho

07/30/2021, 7:07 PM

So both of these are features only available in the Cloud where you can get Flow concurrency limits. For example, you can set a limit of 2 concurrent runs for flows with the label “x”. If 2 flows with label “x” are already running, the next flow runs with the same label will be queued until the previous ones finish. This can be done on the task level too. The second one is also possible with Cloud where you can set automations that change the state of your Flow if it has been running after a certain amount of time (mark it as Failed or Schedule again). In your case, I think the best thing you can do is control the concurrent flow runs using

wait_for_flow_run

task so that nothing gets started until that previous one is finished. I think you can also set upstream dependencies to prioritize shorter flows?

07/30/2021, 7:47 PM

what happens if a flow get hang, and the next execution time arrives? will it run another instance of the flow, or wait for the existing flow to end ?

Kevin Kho

07/30/2021, 7:50 PM

In Prefect Cloud, this is configurable since you can set a lot as failed after it’s been running for a set amount of time. The next execution will start a new flow run unless you tell it not to by enforcing FlowRun concurrency limits. You can also tell the next execution time to not start at all (cancel if late by X amount of time.)

07/30/2021, 8:55 PM

I am looking at this

Copy code

<https://code-maven.com/python-timeout>

Kevin Kho

07/30/2021, 8:59 PM

You know Prefect has a timeout too on the task level right?

08/03/2021, 11:29 PM

I am trying some imperfect way to terminate a flow after some time https://github.com/youdar/process_interupt_timer

08/04/2021, 12:58 AM

Though I am not sure it is working... it ignores the exception somehow

Kevin Kho

08/04/2021, 1:04 AM

Why can’t you do it with the

@task(timeout=…)

08/04/2021, 1:11 AM

I am not using the cloud... does it work suppose to work for tasks when running Prefect on a VM ?

08/04/2021, 1:13 AM

You are correct, it does work on a VM Prefect.. thanks

👍 1

Kevin Kho

08/04/2021, 2:21 AM

I think the confusion was that things can be timed out from inside task on both Server and Cloud. But only Cloud can stop something that is currently hanging (like something hangs. a worker died or something like that)

39 Views

Open in Slack

Previous Next