I m confused about <https docs prefect io latest concepts fl Prefect Community #ask-community

from typing import Dict, Optional
import time
from prefect import task

from hs_prefect_utils import flow

from hs_de_workflows.flows.config import (
    DATABRICKS_TOKEN,
    SLACK_TOKEN,
)


@task
def ssm_task(name):
    time.sleep(2)
    return name


@task(task_run_name="dbt-test")
def dbt_test(
    a,
    b,
):
    """This always fails but should be ignored and not fail the flow."""
    time.sleep(2)
    raise ValueError("should not fail flow")
    return


@task(task_run_name="dbt-docs generate")
def dbt_docs():
    time.sleep(2)
    return


@task(task_run_name="send_edr_report")
def edr():
    """When this fails, it should fail the flow."""
    time.sleep(2)
    raise ValueError("should fail flow")


@task(task_run_name="upload_to_s3")
def upload_docs_to_s3():
    time.sleep(2)


@flow(name="tmp-cs")
def post_dbt_transform(
    select: Optional[str] = "tag:canary",  # XXX temporary debug
    exclude: Optional[str] = "elementary",
    variables: Optional[Dict[str, str]] = None,
):
    databricks_token = ssm_task.submit(DATABRICKS_TOKEN)
    slack_token = ssm_task.submit(SLACK_TOKEN)

    dbttest = dbt_test.submit(a=databricks_token, b=slack_token).result(
        # so that any error raised doesn't get caught by the Flow
        # and we can continue executing downstream tasks
        raise_on_failure=False
    )
    edr_report = edr.submit(
        wait_for=[dbttest],
        return_state=True,
    )
    dbtdocs = dbt_docs.submit(
        return_state=True,
    )
    upload_docs_to_s3.submit(
        wait_for=[dbtdocs],
        return_state=True,
    )
    # this allows us to ignore failed dbt_test since, if a test fails, dbt returns a non-zero exit code
    # <https://docs.prefect.io/latest/concepts/flows/#final-state-determination>
    return dbtdocs, edr_report, upload_docs_to_s3


post_dbt_transform()

When I run it (see screen shot) I'm seeing two issues: • I expect final state of the flow to be

FAILED

• The task "dbt-docs generate" should have executed in parallel with the "ssm_tasks(s)"

Nate

08/28/2024, 3:21 PM

hi @Constantino Schillebeeckx - are you using prefect 2 or 3?

Constantino Schillebeeckx

08/28/2024, 3:22 PM

Nate

08/28/2024, 3:27 PM

gotcha i think the behavior is expected as described by these two bullets from the docs you linked • If the flow does not return a value (or returns

None

), its state is determined by the states of all of the tasks and subflows within it. • If the flow run returns any other object, then it is marked as completed. so since you're not allowing those exceptions to raise, and you have a non

None

return

value, its

Completed

Nate

08/28/2024, 3:27 PM

are you noticing a change in behavior, or this is just not what you expect?

Constantino Schillebeeckx

08/28/2024, 3:28 PM

what about:

If a flow returns a mix of futures and states, the final state is determined by resolving all futures to states, then determining if any of the states are not
COMPLETED
.

Constantino Schillebeeckx

08/28/2024, 3:28 PM

In my case, I'm returning multiple states, I guess the above does not apply?

Nate

08/28/2024, 3:30 PM

in your example, you seem to be returning

tuple([PrefectFuture, PrefectFuture, Task])

, so yeah it doesnt apply perhaps you intended to return the future resulting from submitting

upload_docs_to_s3

Constantino Schillebeeckx

08/28/2024, 3:32 PM

In my mind, if a flow returned:

Copy code

SUCCESS, SUCCESS, FAILED

it would be marked as

FAILED

if instead it returned

Copy code

SUCCESS, SUCCESS, SUCCESS

it would be marked as

SUCCESS

In my case, am I not returning ?

Copy code

PrefectFuture, PrefectFuture, PrefectFuture

Nate

08/28/2024, 3:33 PM

Copy code

upload_docs_to_s3.submit(
        wait_for=[dbtdocs],
        return_state=True,
    )

    return dbtdocs, edr_report, upload_docs_to_s3

Copy code

upload_docs_to_s3

is a task object no?

Constantino Schillebeeckx

08/28/2024, 3:35 PM

ah you're right! 🥹

Constantino Schillebeeckx

08/28/2024, 3:35 PM

that resolves final state determination; do you have any thoughts on why "dbt-docs generate" isn't executing at the very start of the flow (parallel to SSM)?

Nate

08/28/2024, 3:43 PM

👍 hmm what happens if you move

Copy code

dbtdocs = dbt_docs.submit(
        return_state=True,
    )

up before

Copy code

dbttest = dbt_test.submit(a=databricks_token, b=slack_token).result(
        # so that any error raised doesn't get caught by the Flow
        # and we can continue executing downstream tasks
        raise_on_failure=False
    )

where .result() is blocking waiting for the future from

dbttest

Constantino Schillebeeckx

08/28/2024, 3:48 PM

hmmm that works ....

Constantino Schillebeeckx

08/28/2024, 3:48 PM

this feels like a bug?

Nate

08/28/2024, 3:48 PM

i dont think so! i think its just because

.result()

is blocking so if you submit the other work before you start blocking, you give it time to do its thing in another thread

Constantino Schillebeeckx

08/28/2024, 3:49 PM

ah! my bad

Constantino Schillebeeckx

08/28/2024, 3:49 PM

ok that makes sense; although it doesn't feel very intuitive; the flow is just a dag, it shouldn't care where i define a task ...

Nate

08/28/2024, 3:50 PM

the flow is just a dag

🙂 this was true in prefect 1, but is not in prefect>2 we discover the graph at runtime based on your python control flow

🙌 1

Constantino Schillebeeckx

08/28/2024, 3:58 PM

thanks for all the help @Nate

catjam 1

4 Views

Open in Slack

Previous Next