Hi all! Is anyone aware of a way of timing out a f...
# ask-community
s
Hi all! Is anyone aware of a way of timing out a flow itself and not just the tasks? Ie Task has
timeout
which we can pass in, but Im currently experience some odd issues where tasks seem to be lost in dask somewhere (they are submitted to dask but never come back, never time out), and this means my flows never end. Ideally Ill try to dig into our env and dask and prefect and figure out what is causing silent untracked failure, but as an interim solution, does anyone know of a way I can say “Cancel the flows and all tasks if its been an hour since you started?”
For some context, this is what a flow in that situation typically looks like. Startedmany hours ago, still in the process of being cancelled manually, and the even the past tasks that come from a parametrised run get sent for execution but never picked up
a
@Samuel Hinton if you are on Prefect Cloud, you can use Automations to set SLA on a flow - here is how it looks like:
s
Alas, server. But considering cloud once Orion is out and stable.
a
What is also relevant: • When cancelling a flow run, any actively running tasks can’t be hard-stopped when using a shared Dask cluster - instead the flow runner will stop submitting tasks but will let all active tasks run to completion. • With a temporary cluster the cluster can be shutdown to force-stop any active tasks, speeding up cancellation. So if you’re currently using an always-on Dask cluster, you can experiment with a temporary Dask cluster instead to help mitigate such issues.
s
Will look into it, cheers
👍 1
Also, with those SLA’s, are they exportable to code? We’re moving to infrastructure as code, and the idea of having a third party which requires significant manual input to get up and running would be a pain point on any sign off. So fingers crossed, Ifra-as-code available now or coming soon?
a
well, technically speaking it’s not infrastructure. But you can create Automation actions via GraphQL API instead of using the UI. Example:
Copy code
mutation {
  create_action(input: {config: {create_flow_run: {flow_group_id: "b097b505-f4b3-401d-9e69-87eccbcc0794"}}}) {
    id
  }
}
for you it would probably be something like this to cancel a flow run based on SLA:
Copy code
mutation {
  create_action(
    input: {config: {cancel_flow_run: {message: "Flow run cancelled because it ran longer than the duration specified in the SLA"}}}
  ) {
    id
  }
}
s
Great, Ill include that in my write up, thanks for all the help @Anna Geller 🙂
👍 1