Hi I think Ive found a major bug related to subflo...
# ask-community
d
Hi I think Ive found a major bug related to subflows. I have no idea on how to replicate it. WIll continue as thread.
I am using self deployed kubernetes server and agent with helm deployment [Prefect v 2.10.7]. I was looking into the pods that were not "Completed". and noticed one that has been running for 7 days..
So I filtered "Running" flows in UI and found out that a few flows had been running. Snap below: **Snap isnt of the flow thats been running for 7 days
So the next logical step would be to check the parent flow.
And the parent flow has completed successfully.
The sub flow "merry-moth" is linked to parent flow "naughty-porpoise". The sub flow is running but the parent flow is complete with success. I look for "merry-moth" flow run in the "sub flow runs" tab and "merry-moth" is not in the subflow list.
v
Hello, I am just getting started with Prefect and I experienced the same issue, also on kubernetes (GKE) with self-hosted prefect-server
d
Looking for Feedback from Prefect team.
j
Hi there! Can I clarify what you'd expect vs what you're seeing? It sounds like your expectation was that the parent flow run should not be completed until the subflow run is done? Is that correct or was there something else that didn't meet your expectations? And can you give a bit more info about how you started the subflow runs? Or the code (or an MRE) from the flows (my main thing to check there is are you returning anything from your parent flow?).
d
Hi @Jenny I'll have more details in tomorrow
j
Thank you!
d
Hi @Jenny, Yup, I expect parent flow to fail if sub flows fail. I wouldnt call it MRE cuz this does not ensure that the issue is always present; but this is how I call subflows.
Copy code
from prefect import flow


@flow
def sub_flow(param: str):
    print(param)


@flow
def main_flow():
    alphabets = ["a", "b", "c"]
    for _a in alphabets:
        sub_flow.with_options(name="flow_for_{_a}")(param=_a, return_state=True)
j
Thank you! I also wasn't able to reproduce with that. Would you mind opening an issue so we can track and see if we can add more info as we get it?
d
Yes . I could open a ticket for it later. THanks @Jenny, But is my assumption correct that direct invocation of subflow should be blocking the main flow?
j
Yes. Subflows should block execution of the parent flow until completion. However, asynchronous subflows can be run in parallel. There's also a few situations where you may have a running subflow run and a completed parent run e.g. if a subflow run is retried. If you can open a ticket that would be great and we can hopefully find a way to get more info.
@Deceivious - wanted to do a quick update that we've got some work in place on subflow run cancellation in both cloud an the OSS repo. Again, please let us know if you see again or get more info.
d
I do not think the issue i had is related to this improvement though.
j
Oh sorry you are correct. Your run wasn't cancelled it was completed. I'm going for coffee.
🙌 1
d
Glad to know that my issues has been pinned tho 😄
@Jenny I managed to replicate it :v Run a flow with multiple sub flow such that at least one of the sub flow fails. Rerun the failed main flow in a way that all sub flows complete. In the flow run list , the sub flow that failed will still be around, but when u check the parent - it will be completed.
j
Thanks for the update/circle back! When you say "still be around" you mean it's still in a running state? And to confirm I understand your description: 1. You have a parent flow with a few subflow runs. 2. One of your subflow runs fails so your parent fails. 3. You re-run the parent flow run (did you do this from the UI?). 4. The subflow runs complete? 5. The parent flow run completes? I'm not certain I've got points 4 & 5 correct there. Do all the subflow runs then complete for you? Or is one stuck?
d
3. yes from the UI 4. the rerun sub flow completes. 5. The rerun main flow completes.
What happens to a running sub flow if the server is hard shut down? the Sub flow will be stuck on running state right?
I might have been wrong about this. https://prefect-community.slack.com/archives/CL09KU1K7/p1689107936392039?thread_ts=1685540059.047499&cid=CL09KU1K7 This doesnt result in Running state but in failed.
j
Ah right. Yeah I was checking my thinking there. The original issue was that it was stuck in running right?
d
yes
subflow in running state with no running pods. But parent was completed.
Hi @Jenny - has there been any update on this? // Any tickets made? I have been seeing this often. Main Calling flow in Success. Sub flow in Failed with error log
Copy code
raise UnfinishedRun(prefect.exceptions.UnfinishedRun: Run is in RUNNING state, its result is not available.
Task run in Running state. No pods attached.
It seems that when I retry the main flow, The sub flow that failed is unbound from the main flow and a new sub flow is spawned in its place which is successful hence changing the main flow state to Completed. The older sub flow that failed is not viewable on the main flow subflow list but the main flow is linked to the failed sub flow on the failed sub flow's page. The main issue though is the task run that is "running" holds its concurrency tag limit.
Copy code
raise UnfinishedRun(prefect.exceptions.UnfinishedRun: Run is in RUNNING state, its result is not available.
Unsure how to replicate this. But Ive seen other users complaining about this as well.
Verified it- that is the case. Replication code.
Copy code
from prefect import flow
from prefect.deployments import Deployment

@flow
def sub_flow():
    raise Exception("Error") #COMMENT THIS CODE LATER
    return

@flow
def main_flow():
    sub_flow()

if __name__ == "__main__":
    Deployment.build_from_flow(name="name",flow=main_flow,apply=True)
1. Run the code. 2. Run the flow using the deployments page on an agent. 3. Comment this the line that has "#comment this code later" text 4. Retry main flow There will be one main flow with 2 sub flow (1 failed and 1 passed).
👀 1
j
Thanks for investigating and for following up. Running through your MRE, I'm seeing what I would expect. The main flow runs and retries. If the subflow run completes, the parent run completes, if it fails, the parent fails. As Will explained in a separate thread, it's currently expected (though admittedly not intuitive) behavior that a parent/main flow run would kick off a new subflow run. I think we could do with some better documentation to set that out clearly and we might (but I want to check internal opinions here!) want to see if we can return all subflow runs as children of the main/parent run rather than just the latest one. More unexpected, but slightly lost in the other questions, is that you have a task that gets stuck in running? I didn't see that when I ran your MRE. Can you reliably reproduce that?
d
@Jenny thanks the stray failed sub flow is ok imo as well. This is fine. My initial question on this thread has been solved. Origin being me manually retrying flows. Thanks @Jenny. Ill start a new thread for the specific issue I am facing now.
👍 1
s
@Deceivious looks like there’s no answer to what’s going on with
UnfinishedRun
?
d
Yes.
s
Ok, gthanks
d
Have u been seeing those issues as well? There is a ticket in GitHub related to this.
s
Yes. Where is that issue?
d
M away from my PC. Maybe try search for pending state result not available in issue search bar.