Flows being stuck in running stage Hello everyone!...
# prefect-community
a
Flows being stuck in running stage Hello everyone! I'm running into some problems, can you help me? I'm using Prefect 1.0. I built and coded the flows to submit more than 50 flows simultaneously. Sometimes all of these 50 flows run until the success state. But sometimes about 2 out of 50 are stuck at running state (whether all tasks are finished, or there are tasks left). It kept being "*running*" forever until I CANCEL it. I don't know where to find and fix them, as they kept turning into zombies like that. Can I ask where the problem might be (code, my Prefect agent, conflicts,...?) so that I can check it and try to fix it? Thank you so muchhhhh
m
Hey @An Ninh V农 this can be a difficult issue to track down since it could be related to the flow itself, the agent, or the execution environment. This is often a result of some sort of crash/error within the infrastructure running your flow so I'd start by checking out any logs you have from the execution environment including but not limited to your agent logs. the removal of zombie task runs is also largely tied to the heartbeat process this discourse article discusses some of the common issues around that while also describing what it's doing https://discourse.prefect.io/t/flow-is-failing-with-an-error-message-no-heartbeat-detected-from-the-remote-task/79. tldr: This is often an issue caused by some form or error/crash in your execution environment so checking the logs there is a good place to start
鉂わ笍 1
馃憖 1
a
Sorry, I asked about the Timeout solution, but it seems not sufficient for us anymore. This morning I ran my flows and they got frozen once again. All of them stopped logging (they are at different stages of the task, so I think that it's not the code that made it stop (my peers at another country can run this well, but there was still 1-2 freezing). Can I ask where can I check the error and solve it? Where the error might be? I checked the logs and it shows nothing else since it's stopped. I also check the agent and the agent stopped at "INFO - agent | Completed deployment of flow run". I could use Timeout here, but I don't know exactly when the task we are doing will finish its run, as for different accounts, the running time will be varied. Can you suggest me places I can find to fix the issue? Thank you so so much!
My Prefect version is 0.14.22
m
This sounds like an infrastructure issue, i.e it sounds like something is crashing in your execution environment especially if where this is failing seems to change each time this runs, it could be a memory issue but its hard to say outright. That's probably where I'd start in terms of troubleshooting though
馃檹 1
a
Thank you! I have checked it and it seems that the running state on my laptop is based on lack of memory as when I deploy these flows, the RAM usage went straightly to 100%. I think the problem with my friends' computer might be the same. Though I'm finding a way to manage the memory usage of the flows (preventing it from overusing memory). But I have searched on Discourse, GitHub and Google but couldn't figure out the way 馃槥 Can you help me? Thank you!