Question If a task were to fail due to insufficient memory c Prefect Community #prefect-server

Question. If a task were to fail due to insufficie...

Austin Vecchio

02/28/2022, 9:29 PM

Question. If a task were to fail due to insufficient memory/cpu requirements, would it be possible to restart the task or workflow and request more resources?

Kevin Kho

02/28/2022, 9:31 PM

Kind of. If you use

create_flow_run

, you can pass in a RunConfig and the RunConfig can be bumped up if you are using something like k8s. So you need an outer flow to manage that and read the error. Not 100% sure it’ll work though. The easier approach is you read something like a file size and then stick it in the RunConfig

Austin Vecchio

02/28/2022, 9:41 PM

Okay. To elaborate, I could be processing a series of tasks that can vary in the required amount of resources and I would prefer not to request resources that will not be utilized. This is a feature that I am utilizing from the Nextflow manager and AWS Batch. Although I am not too familiar with K8s or some other cloud computing systems, would it be alright if I look into adding the functionality to Prefect?

Kevin Kho

02/28/2022, 9:44 PM

Ohhh you said task task I thought you meant Flow. So you want a retry on AWS Batch but upping the resources right?

Austin Vecchio

02/28/2022, 9:51 PM

Apologies on semantics. I know that the resources (cpu/memory) is controlled for a Prefect workflow through the run config. I was just wondering if it would be possible, regardless of the cloud environment (ECS/K8s/Vertext) if I could restart a workflow if a program fails for certain tasks and assign additional memory/cpu.

Austin Vecchio

02/28/2022, 9:51 PM

I know Prefect supports AWS Batch as a function to call within a Prefect Task. Perhaps this is the 'better' alternative.

Kevin Kho

02/28/2022, 10:19 PM

In a call but will respond in a bit

Austin Vecchio

02/28/2022, 10:20 PM

Thanks for the update. And it is fine, I am only evaluating Prefect and seeing what capabilities exist.

Kevin Kho

02/28/2022, 10:31 PM

Yes you can up the Batch resources on retries (not directly because task inputs can’t be changed) but we can do stuff like keeping the last attempted values in the KV Store. This is on the task level (not for Flow)

Kevin Kho

02/28/2022, 10:33 PM

If need to restart Flow level, you need an outer flow than controls increasing the RunConfig limits, again maybe with the use for the KV Store or maybe not. You would create new flow runs and then see if they fail, and then retry with an increased amount.

Kevin Kho

02/28/2022, 10:33 PM

Does that all make sense? I think these are inherent limitations that can’t be worked around at the moment. Not meaning to discourage, if you have sound proposals, I’m sure we’d be happy to hear 🙂 and we can work on getting in a PR for that

Austin Vecchio

03/01/2022, 4:05 PM

This all makes sense and this will not be a challenging way to manipulate the resources. I hope to be able to try it out soon.

👍 1

Austin Vecchio

03/01/2022, 4:07 PM

The last question I have is regarding a 'failed' workflow. If I have ten 'tasks' in parallel and one were to fail, would the entire workflow fail? Or would the other tasks complete and I could retry the failed 'tasks' once all other 'tasks' are completed?

Kevin Kho

03/01/2022, 4:16 PM

If independent and there are no dependencies, yes the other ones will still fire and then you can click the restart button yep

Austin Vecchio

03/01/2022, 4:43 PM

Understood. Thank you for all of the information!

2 Views

Open in Slack

Previous Next