m

    Martim Lobao

    1 year ago
    hi, i’m trying to restart a failed flow of flows, but restarting it always results in a not-very-helpful error. [log in thread] i’ve tried restarting the flow several times (and usually have to hit the restart button twice before it actually submits the flow for execution) and i always get the same error. the logs for first dependent flow (person-build) don’t pick up anything regarding the restart for the parent flow. this is essentially the setup i have for the parent flow: [code in thread]
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @Martim Lobao, regarding the delay when hitting the restart button. Chatted with the team about it. The “delay” may be from the API request + waiting for the agent to pick it up. The agent polls in a 10 second loop so it may be that the agent is just taking a bit of time to pick it up. Is it one of the StartFlowRun tasks that fail? Are there more logs on the page of the sub flow run/
    Also if you get the chance, could you move either the traceback or flow code to the thread to free up space in the main channel?
    m

    Martim Lobao

    1 year ago
    prefect log:
    INFO
        martim_peopledatalabs_com restarted this flow run
    INFO
        martim_peopledatalabs_com restarted this flow run
    INFO agent
        Submitted for execution: Task arn:aws:ecs:us-west-2:5567...
    INFO GitHub
        Downloading flow from GitHub storage - repo: 'peopledatalabs/prefect', path: 'src/pdlapps/orchestration/flows/build_then_release.py', ref: 'prefect-testing'
    INFO GitHub
        Flow successfully downloaded. Using commit: bac24...
    INFO CloudFlowRunner
        Beginning Flow run for 'build-then-release'
    INFO CloudTaskRunner
        Task 'Flow person-build': Starting task run...
    INFO Flow person-build
        Flow Run: <https://cloud.prefect.io/pdl/flow-run/5133d6be>...
    INFO CloudTaskRunner
        FAIL signal raised: FAIL('5133d6be... finished in state <Failed: "Some reference tasks failed.">')
    flow setup:
    person_build_flow = StartFlowRun(flow_name="person-build", project_name=get_stage(), wait=True)
    release_flow = StartFlowRun(flow_name="release", project_name=get_stage(), wait=True)
    
    with Flow(
        "build-then-release",
        executor=LocalDaskExecutor(num_workers=8),
        result=PrefectResult(),
        state_handlers=[slack_notifier, terminate_on_cancel],
    ) as flow:
        release_flow(upstream_tasks=[person_build_flow])
    Kevin Kho

    Kevin Kho

    1 year ago
    Thanks for moving! Yeah I think the failed logs and error would appear in the subflow logs
    m

    Martim Lobao

    1 year ago
    hey @Kevin Kho, thanks for the reply! I’ve noticed that when i first click the retry button, I often get an error message at the bottom of the page with a message along the lines of “unfortunately, the job was not able to restart”. I’ll copy the message next time it pops up, but it’s been happening pretty consistently. the StartFlowRuns function normally when first called, this is a case where the job failed for whatever reason and needs to be restarted. in this case, I try to restart the parent “build-then-release” flow, but it appears to get an error when trying to restart the child flows. here, it seems to try to start the failed “person-build” flow run but fails because “some reference tasks failed”. The person-build run that was started by the parent flow doesn’t contain any log after it first failed.
    Kevin Kho

    Kevin Kho

    1 year ago
    Ohh I know what you are saying. Restarting a main flow just returns the result of the sub flows. Here is why:
    StartFlowRun
    has an idempotency key by default. The idempotency key will try to start a new flow run, but because there was already an existing run with the same idempotency key so it won’t restart that sub flow. The subflow needs to be individually restarted
    So what needs to happen is that you need to cache the results of the subflow and then kick off a new flow run from the main flow by supplying a different idempotency key.
    m

    Martim Lobao

    1 year ago
    hmm, is that intended? so the restart button in a flow of flows will never work and i have to create a new run each time?