Trying my luck again, hi everyone, I am recently f...
# ask-community
c
Trying my luck again, hi everyone, I am recently facing a strange problem. I have a flow (which is a subflow of another one) that runs multiple tasks. Once in a while the tasks do not even start (get stuck on pending state) and the logs of each tasks says "Crash detected! Execution was cancelled by the runtime environment." The main flow is stuck on running state and the log says: " Flow run '...' - Crash detected! Request to http://.../api/task_runs faield: PoolTimeout: . The main problem is that the tasks & subflow get stuck and even a timeout (timeout_seconds param) does not help to release the freeze... (The flow is still in running state...) I am running the flows as a k8s job on openshift and all the flows and tasks are async. Is it a known problem? Is calling an async function (as a subflow or task) could be problematic?😵‍💫
d
Just to clarify: do the processes themselves freeze where they are running or do they only show as running in the prefect UI?
c
@Dev Dabke I checked again and it seems that the processes enters an infinity loop with an asyncio exception:
Copy code
asyncio - Exception in callback SubprocessStreamProtocol.pipe_data_recieved(2,...)
Traceback (most recent call last):
File "/usr/local/lib/pyhton3.8/asyncio/events.py", line81 in _run
self._context.run(self._callback, *self.args)
File "/usr/local/lib/pyhton3.8/asyncio/subprocess.py", line 73, in pipe_data_recieved
reader.feed_data(data)
File "/usr/local/lib/pyhton3.8/asyncio/streams.py", line 472, in feed_data
assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
It's important to say that I'm currently running in Debug Mode for investigating the problem, so this new data might be a cause of the debugging process 🤔
d
One q: are you main-guarding your flow?
Copy code
if __name__ == "__main__":
  flow_fn()
c
I am not
d
Okay, I would try that. async is super broken without the main guard.
(That's a general python problem, not a prefect-specific problem.)
c
Yeah it's a great idea
@Dev Dabke I'm deploying the flows with flow.from_source(...).deploy() functions inside
if __name__ == "__main__":
already (just on another file) So I'm not sure if it's the problem or how to make it more robust. Any other ideas?
d
How are you launching the flow? From the UI?
c
Yes
d
Okay, then I suspect it's not the main guarding that's at issue.
Out of curiosity, what happens if you run the deployment by calling it locally?
Do you get the same error?
Also, in your flow/task code, are you `await`ing all of the
async
functions?
One other thing that I also see: python 3.8 and 3.10+ have different handling of
async
functions. I haven't used 3.8 in a bit, but I would take a look at: https://stackoverflow.com/questions/73599594/asyncio-works-in-python-3-10-but-not-in-python-3-8
c
I didn't tried to call it locally. This problem happens only when my flow runs for a long time (a few hours). And in my flow I await the subflow and await a gather of all the tasks in each subflow
d
Ah gotcha, okay so it seems like there's two issues: 1. The underlying error: it seems like an I/O issues (is the network getting interrupted? is the computer going to sleep?) 2. The fact that the flow isn't being marked as crashed. Is it possible that you're catching the error with
except
and not re-raising it? Without seeing the code, it's a bit hard to guess as to why the error is not causing the processes to crash.
The one thing that's tripping me up: from your initial message, it seems like the process is actually crashing but your second message seems to imply that the process is stuck?
c
I don't think the network is the problem, on my checks it looked fine and nothing seemed not normal. About the crashed/stuck problem. I tried to make the main flow to crash when the subflow is stuck, it didn't work 100% of the times. So to sum it up: Main flow - sometimes crash and sometimes get stuck (I really achieved crashing when I successfully caught subflows exceptions and tried to continue to the next subflow) Sub flows - only get stuck in "running" state Tasks - get stuck or in "pending" or in "running" state I know, it's confusing as hell
d
One thing to distinguish: is the underlying process stuck or is it reflecting as "stuck" in the prefect UI?
When you say sub flows get stuck in a "running" state, it seems like you're talking about the UI?
But does the actual process itself crash?
Or are the processes also stuck?
c
The processes are stuck in the UI and in logs in the k8s job that runs the flow I only get is of the async problem and of the APILogWorkerThread
a
Hi, i'm seeing something similar. did you ever solve this?
c
Yeah it was a memory leak in an old prefect version, try to update it first
170 Views