hi, i’m trying to restart a failed flow of flows, ...
# ask-community
m
hi, i’m trying to restart a failed flow of flows, but restarting it always results in a not-very-helpful error. [log in thread] i’ve tried restarting the flow several times (and usually have to hit the restart button twice before it actually submits the flow for execution) and i always get the same error. the logs for first dependent flow (person-build) don’t pick up anything regarding the restart for the parent flow. this is essentially the setup i have for the parent flow: [code in thread]
k
Hey @Martim Lobao, regarding the delay when hitting the restart button. Chatted with the team about it. The “delay” may be from the API request + waiting for the agent to pick it up. The agent polls in a 10 second loop so it may be that the agent is just taking a bit of time to pick it up. Is it one of the StartFlowRun tasks that fail? Are there more logs on the page of the sub flow run/
Also if you get the chance, could you move either the traceback or flow code to the thread to free up space in the main channel?
👍 1
m
prefect log:
Copy code
INFO
    martim_peopledatalabs_com restarted this flow run
INFO
    martim_peopledatalabs_com restarted this flow run
INFO agent
    Submitted for execution: Task arn:aws:ecs:us-west-2:5567...
INFO GitHub
    Downloading flow from GitHub storage - repo: 'peopledatalabs/prefect', path: 'src/pdlapps/orchestration/flows/build_then_release.py', ref: 'prefect-testing'
INFO GitHub
    Flow successfully downloaded. Using commit: bac24...
INFO CloudFlowRunner
    Beginning Flow run for 'build-then-release'
INFO CloudTaskRunner
    Task 'Flow person-build': Starting task run...
INFO Flow person-build
    Flow Run: <https://cloud.prefect.io/pdl/flow-run/5133d6be>...
INFO CloudTaskRunner
    FAIL signal raised: FAIL('5133d6be... finished in state <Failed: "Some reference tasks failed.">')
flow setup:
Copy code
person_build_flow = StartFlowRun(flow_name="person-build", project_name=get_stage(), wait=True)
release_flow = StartFlowRun(flow_name="release", project_name=get_stage(), wait=True)

with Flow(
    "build-then-release",
    executor=LocalDaskExecutor(num_workers=8),
    result=PrefectResult(),
    state_handlers=[slack_notifier, terminate_on_cancel],
) as flow:
    release_flow(upstream_tasks=[person_build_flow])
k
Thanks for moving! Yeah I think the failed logs and error would appear in the subflow logs
m
hey @Kevin Kho, thanks for the reply! I’ve noticed that when i first click the retry button, I often get an error message at the bottom of the page with a message along the lines of “unfortunately, the job was not able to restart”. I’ll copy the message next time it pops up, but it’s been happening pretty consistently. the `StartFlowRun`s function normally when first called, this is a case where the job failed for whatever reason and needs to be restarted. in this case, I try to restart the parent “build-then-release” flow, but it appears to get an error when trying to restart the child flows. here, it seems to try to start the failed “person-build” flow run but fails because “some reference tasks failed”. The person-build run that was started by the parent flow doesn’t contain any log after it first failed.
k
Ohh I know what you are saying. Restarting a main flow just returns the result of the sub flows. Here is why:
StartFlowRun
has an idempotency key by default. The idempotency key will try to start a new flow run, but because there was already an existing run with the same idempotency key so it won’t restart that sub flow. The subflow needs to be individually restarted
So what needs to happen is that you need to cache the results of the subflow and then kick off a new flow run from the main flow by supplying a different idempotency key.
m
hmm, is that intended? so the restart button in a flow of flows will never work and i have to create a new run each time?