Question. If a task were to fail due to insufficie...
# prefect-server
a
Question. If a task were to fail due to insufficient memory/cpu requirements, would it be possible to restart the task or workflow and request more resources?
k
Kind of. If you use
create_flow_run
, you can pass in a RunConfig and the RunConfig can be bumped up if you are using something like k8s. So you need an outer flow to manage that and read the error. Not 100% sure it’ll work though. The easier approach is you read something like a file size and then stick it in the RunConfig
a
Okay. To elaborate, I could be processing a series of tasks that can vary in the required amount of resources and I would prefer not to request resources that will not be utilized. This is a feature that I am utilizing from the Nextflow manager and AWS Batch. Although I am not too familiar with K8s or some other cloud computing systems, would it be alright if I look into adding the functionality to Prefect?
k
Ohhh you said task task I thought you meant Flow. So you want a retry on AWS Batch but upping the resources right?
a
Apologies on semantics. I know that the resources (cpu/memory) is controlled for a Prefect workflow through the run config. I was just wondering if it would be possible, regardless of the cloud environment (ECS/K8s/Vertext) if I could restart a workflow if a program fails for certain tasks and assign additional memory/cpu.
I know Prefect supports AWS Batch as a function to call within a Prefect Task. Perhaps this is the 'better' alternative.
k
In a call but will respond in a bit
a
Thanks for the update. And it is fine, I am only evaluating Prefect and seeing what capabilities exist.
k
Yes you can up the Batch resources on retries (not directly because task inputs can’t be changed) but we can do stuff like keeping the last attempted values in the KV Store. This is on the task level (not for Flow)
If need to restart Flow level, you need an outer flow than controls increasing the RunConfig limits, again maybe with the use for the KV Store or maybe not. You would create new flow runs and then see if they fail, and then retry with an increased amount.
Does that all make sense? I think these are inherent limitations that can’t be worked around at the moment. Not meaning to discourage, if you have sound proposals, I’m sure we’d be happy to hear 🙂 and we can work on getting in a PR for that
a
This all makes sense and this will not be a challenging way to manipulate the resources. I hope to be able to try it out soon.
👍 1
The last question I have is regarding a 'failed' workflow. If I have ten 'tasks' in parallel and one were to fail, would the entire workflow fail? Or would the other tasks complete and I could retry the failed 'tasks' once all other 'tasks' are completed?
k
If independent and there are no dependencies, yes the other ones will still fire and then you can click the restart button yep
a
Understood. Thank you for all of the information!