hello community hello Prefect team i am using the Ray task r Prefect Community #ask-community

hello community, hello Prefect team, i am using th...

Justin Trautmann

02/16/2023, 9:46 AM

hello community, hello Prefect team, i am using the Ray task runner with a cluster that consists mainly of AWS spot instances. i would love to just restart tasks when the underlying spot instance is unexpectedly terminated. this functionality is already supported by Ray but Prefect's

PreventRedundantTransitions

orchestration policy prevents a repeated task execution as the task remains in status

Running

when the instance is terminated and a new replacement instance will try to start the task from scratch, including proposing the state

Running

again which leads to:

Copy code

Task run <task_run_id> received abort during orchestration: This run cannot transition to the RUNNING state from the RUNNING state. Task run is in RUNNING state

I feel like this policy has a legitimate reason of preventing multiple agents from attempting to orchestrate the same run but also catches the case where the same agent tries to execute the same task multiple times which should be allowed imo. Anyone having experience with Prefect + Ray on AWS spot and ideas on how to handle this case? PS: @Tim Galvin not sure about Dask and SLURM but maybe your issue is somewhat related

Yaron Levi

02/16/2023, 11:55 AM

@Justin Trautmann Hi, we are also seeing this issue but in a different configuration (running on Render, with continuous loop). You can see all the details here, in the comment I’ve made: https://github.com/PrefectHQ/prefect/issues/7116

Yaron Levi

02/16/2023, 11:57 AM

Also, if you search for “This run cannot transition to the RUNNING state from the RUNNING state” in prefect’s GitHub repo, you’ll find around 4-5 issues with that problem.

2 Views

Open in Slack

Previous Next