https://prefect.io logo
Title
a

Adi Gandra

03/02/2022, 4:16 PM
Hey, i have a task that has ran successfully in the past, but now it gets stuck on ’Task ‘X’: Starting task run...’. All other tasks run fine, and if i manually kill the job and restart it, it runs fine again. I’m running on EKS, so its spinning up a pod to run this task. Any idea’s for debugging? The task just doesn’t seem to start
a

Anna Geller

03/02/2022, 4:23 PM
Can you share a bit more? What is this task doing? Could it be some issue with unclosed files or DB connections? If you could share either your flow or a minimal reproducible example, I could check if I can reproduce and see what might be the issue there. Is this some memory-intensive or long-running task?
a

Adi Gandra

03/02/2022, 4:28 PM
It is connecting and downloading a file from the FTP. It could be related due to the connection, but it only takes 2 seconds to run normally. And haven’t had this problem on restarts.
Is there a way to set max task length execution? So if it goes over that it just cancels, and retries
Because, I could set that to 1 minute for this task and it should never go over that
a

Anna Geller

03/02/2022, 4:34 PM
There are 2 options for how you could approach it: 1) You could set a timeout in seconds:
@task(timeout=120)
2) A bit more effort but more reliable: you could separate out this "problematic" task into a separate flow and call it within the "parent" flow using
create_flow_run
task. This way, you could set a time-based SLA Automation on the flow that does this FTP processing and SLA is very reliable so you could this way ensure that any time this flow takes longer than say 3 minutes, the flow run should be canceled.
k

Kevin Kho

03/02/2022, 4:35 PM
Do the pod logs show anything?
:upvote: 1
a

Anna Geller

03/02/2022, 4:36 PM
SLA automation: and here is an example with the syntax for subflow https://discourse.prefect.io/t/how-can-i-create-a-subflow-and-block-until-it-s-completed/94/2
a

Adi Gandra

03/02/2022, 4:38 PM
Awesome, thanks! Will read into this. Checking pod logs is a good idea, I unfortunately deleted the pods as part of the shutdown process - otherwise it was just stalled for hours. But will check the pod logs next time I run into this
👍 1