I am getting an unexpected error when trying to use `apply m Prefect Community #ask-community

I am getting an unexpected error when trying to us...

Josh

04/23/2021, 7:31 AM

I am getting an unexpected error when trying to use

apply_map

in conjunction with skipped tasks.

Copy code

At least one upstream state has an unmappable result.

👀 1

Josh

04/23/2021, 7:36 AM

Minimal flow to reproduce

Copy code

from typing import Any

import pendulum
from pendulum import DateTime
from prefect import Flow, task, case, apply_map, Task


class PrettyPrint(Task):
    def run(
        self,
        key: str = None,
        value: Any = None,
    ) -> None:
        <http://self.logger.info|self.logger.info>(f"{key}: {value}")


pretty_print = PrettyPrint()
pretty_print_always_run = PrettyPrint(skip_on_upstream_skip=False)


@task
def day_of_week(simulated_day: str = None):
    if simulated_day:
        day = pendulum.parse(simulated_day)
        assert isinstance(day, DateTime)
        return day.day_of_week
    else:
        return pendulum.today().day_of_week


def backcast_on(date):
    with case(day_of_week(date), 6):
        saturday_result = pretty_print(key="This day is Saturday", value=date)
    pretty_print_always_run(
        key="Always Running Backcast on", value=date, upstream_tasks=[saturday_result]
    )


@task
def true_task():
    return True


with Flow(name="Backcast Flow") as backcast_flow:
    backcast_range = ["2021-04-03", "2021-04-04"]

    with case(true_task(), True):
        apply_map(backcast_on, backcast_range)

    with case(true_task(), False):
        apply_map(backcast_on, backcast_range)


if __name__ == "__main__":
    backcast_flow.register(project_name="sandbox")

Josh

04/23/2021, 7:37 AM

Josh

04/23/2021, 7:37 AM

Is there a proper way to conditionally skip everything in the apply_map based on the case?

Kevin Kho

04/23/2021, 1:52 PM

Hi @Josh! I think the issue here is the dependency on an upstream task that’s in a case statement so it might not get executed.

pretty_print_always_run

sets

saturday_result

as an upstream task. Is that intended?

Kevin Kho

04/23/2021, 1:55 PM

Case is also a task used for flow definitions so I think you want to use Python

if

and

else

in the

backcast_on

function.

Kevin Kho

04/23/2021, 1:58 PM

Copy code

def backcast_on(date):
    if day_of_week(date) == 6:
        saturday_result = pretty_print(key="This day is Saturday", value=date)
        pretty_print_always_run(
            key="Always Running Backcast on", value=date, upstream_tasks=[saturday_result]
        )
    else:
        pretty_print_always_run(
            key="Always Running Backcast on", value=date,
        )

Kevin Kho

04/23/2021, 1:58 PM

Maybe do something like this?

Kevin Kho

04/23/2021, 2:03 PM

To your last question, have you seen SKIP signals before? Use them like:

Copy code

from prefect.engine import signals

if day_of_week(date) == 6:
    raise signals.SKIP()
else:
    do other stuff

Kevin Kho

04/23/2021, 2:04 PM

SKIP

will skip all downstream tasks and is treated as a

SUCCESS

Josh

04/23/2021, 3:50 PM

Thanks for looking into this. I want to retrain a model every Saturday and make daily forecasts based on that model. So on Saturday, I want to make sure that the forecasts run after the model is re-trained. For the rest of the week, I can skip the model training and just do forecasting.

Josh

04/23/2021, 3:51 PM

On that note, I also want to know if it’s possible to order or somehow make the task runs dependent. If I backcast a forecast over a 1 week period, how will I guarantee the forecasting for April 4, 2021 runs after the training and forecasting for April 3rd (Saturday).

Kevin Kho

04/23/2021, 3:56 PM

So for the retrain every Saturday, you can make a task to check for Saturday and return True/False. Then use the

case

statement to run everything below or not inside the Flow definition.

Kevin Kho

04/23/2021, 3:57 PM

I suspect you want a Flow-of-flows type script where you kick off a defined Flow and pass in the date. That way, you can pass the dates sequentially.

Kevin Kho

04/23/2021, 3:59 PM

Copy code

start = StartFlowRun(project_name="testing-result", wait = True)
with Flow('master-flow') as flow:

    # True or False
    skipped = check_skip()

    # Insert other flows here
    with case(skipped, False):
        for date in ["2021-04-03", "2021-04-04"]
            start(flow_name="flow", parameters={'date': date})

Kevin Kho

04/23/2021, 3:59 PM

Does this make sense?

Josh

04/23/2021, 8:18 PM

By using the for loop to kick off flows, will they be run in order? Does this mean it’s a sequential for loop?

Kevin Kho

04/23/2021, 8:21 PM

Yes sequential loops are possible. Order is preserved. Use the

wait=True

StartFlowRun

to ensure that each flows ends before calling the next.

Josh

04/27/2021, 3:41 AM

@Kevin Kho I just tried it with a toy example, and the order of the flows are not preserved

Copy code

dates = range(1, 10)

date_param = Parameter("date")


class DoSomething(Task):
    def run(self, value):
        sleep(1)
        self.logger.warning(value)


do_something = DoSomething()

with Flow("sub-flow") as sub_flow:
    do_something(date_param)

sub_flow_task = StartFlowRun(project_name="sandbox", flow_name="sub-flow", wait=True)

with Flow("schedule flow") as schedule_flow:
    for date in dates:
        sub_flow_task(parameters={"date": date})

Josh

04/27/2021, 3:43 AM

Am I missing something? Also, if we need concurrent flows with

wait=True

, we’ll have to upgrade to the standard plan right?

Kevin Kho

04/27/2021, 3:53 AM

I guess I was wrong. Sorry about that! Maybe you can try this to set the upstream tasks on the sub_flow_tasks so they run sequentially

Kevin Kho

04/27/2021, 3:54 AM

Copy code

with Flow("schedule flow") as schedule_flow:
    tasks = [
        sub_flow_task(parameters={"date": date})
        for date in dates
    ]
    for i in range(1, len(tasks)):
        tasks[i].set_upstream(tasks[i - 1])

Kevin Kho

04/27/2021, 3:55 AM

About the concurrent flows, do you know if you’re on a usage based plan or legacy plan?

Kevin Kho

04/27/2021, 3:58 AM

This code works for me I confirmed.

Josh

04/27/2021, 2:13 PM

I’m on legacy.

Kevin Kho

04/27/2021, 2:14 PM

Will DM you

Josh

04/27/2021, 2:40 PM

This only works if dates is a pre-defined iterable right? If it’s a Parameter or the result of a Task, it won’t be able to create the tasks. It can only create the tasks at the Flow creation time?

Josh

04/27/2021, 2:43 PM

Is there a way to map over a set of Parameters dynamicaly? Ideally the user should be able to define

dates

when starting the schedule flow

Kevin Kho

04/27/2021, 2:45 PM

For looping over dynamic lengths, we have Task looping . Have you seen this before?

Josh

04/27/2021, 4:32 PM

Is there a way to have a Dynamic DAG of Task Looping over

StartFlowRun

Kevin Kho

04/27/2021, 4:33 PM

This sounds a bit tricky. I’ll reevaluate your use case and try to make an example of this if it still makes sense. I’ll try making an example

Josh

04/27/2021, 4:36 PM

I want a set of tasks to execute iteratively in a specific order. It’s easier to have a Flow represent a set of tasks than a task of tasks. I want to have them iterate in a specific order and Looping with Dynamic DAGs seem a good solution for that.

Kevin Kho

04/27/2021, 4:37 PM

Gotcha. Yeah I think what it’ll look like is you pass a parameter in and have a Task get the length of it and then loop over that length value. Will try it out myself and get back. Sorry this has taken a bunch of detours.

Josh

04/27/2021, 4:45 PM

I tried subclassing the

StartFlowRun

task but ran into a Pickling context object error. You can take a look at my code here. https://gist.github.com/wangjoshuah/6987991be27c98c14c3eca3e561c3b9f

Josh

04/27/2021, 4:45 PM

It works when I comment out the section around

idempotency_key

, and I don’t understand what that is for 🤷‍♂️

Kevin Kho

04/27/2021, 4:53 PM

The idempotency_key is for situations where you’re not sure a request will go through. Like for example, you hit the GraphQL API 3-4 times but you really only need the flow to run once. The key will make sure that the flow will only have run once in 24 hours.

Kevin Kho

04/27/2021, 4:54 PM

An example is having a place with unstable internet do the request

Kevin Kho

04/27/2021, 4:55 PM

I haven’t tried the looping over StartFlowRun myself so I’ll need to really try it in a bit.

Kevin Kho

04/27/2021, 8:36 PM

I finally have a working example for you. This is

StartFlowRun

Task LOOP

Copy code

from prefect import Parameter, Flow, task, Task
import prefect
from prefect.tasks.prefect import StartFlowRun
from time import sleep
from prefect.engine.signals import LOOP, SUCCESS

class DoSomething(Task):
    def run(self, value):
        sleep(1)
        <http://self.logger.info|self.logger.info>(value)

do_something = DoSomething()

with Flow("sub-flow") as sub_flow:
    one_date = Parameter("one_date")
    do_something(one_date)

sub_flow.register("aws")

sub_flow_task = StartFlowRun(project_name="aws", flow_name="sub-flow", wait=True)

@task()
def loop_over_dates(dates):
    # Starting state
    loop_payload = prefect.context.get("task_loop_result", {"dates": dates})
    dates = loop_payload.get("dates", [])

    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>(dates)

    one_date = dates[0]

    <http://logger.info|logger.info>(f"Checking {one_date}")
    
    try:
        sub_flow_task.run(parameters={"one_date": one_date})
    except SUCCESS:
        # Don't exit the loop on Flow Run success
        pass

    # Drop the first date
    dates.pop(0)

    if len(dates) == 0:
        return  # return statements end the loop
    raise LOOP(message=f"Processing {dates[0]}", result=dict(dates = dates))

with Flow("schedule flow") as schedule_flow:
    date_param = Parameter("dates", default=[1,2,3,4,5])
    loop_over_dates(date_param)

schedule_flow.register("aws")

Kevin Kho

04/27/2021, 8:37 PM

Note the try-except. There is an issue with

StartFlowRun

exiting the loop when it succeeds. This will be fixed soon. The try-except will make this work for now. This preserves your sequential dependency as the loop gets processed in order.

Josh

04/27/2021, 9:53 PM

I tried this and it loops over the dates in the

loop_over_dates

task, but only one run of the

sub-flow

Flow is ever created

Josh

04/27/2021, 10:28 PM

I was able to get it working without the idempotency_key here. https://gist.github.com/wangjoshuah/0b84bce74253133193dc38a74a6c0e9f

Kevin Kho

04/27/2021, 10:32 PM

Oh I see what you mean from earlier. My bad. The example will work if you pass a unique idempotency key to the

StartFlowRun.run()

call. Do you still want me to continue with the example?

Josh

04/27/2021, 10:35 PM

I have never used idempotency keys before, so if you know how to do that, it would be amazing. If you think this is a good working example, I’ll submit it to the project as well to make it available for others to use.

Kevin Kho

04/27/2021, 10:37 PM

Ok I’ll continue on with that. I think it’s just a one line change

👍 1

Kevin Kho

04/27/2021, 10:43 PM

I confirmed this is working now. The subflows are being started. The idempotent run creation docs is very short if you’re interested.

Copy code

from prefect import Parameter, Flow, task, Task
import prefect
from prefect.tasks.prefect import StartFlowRun
from time import sleep
from prefect.engine.signals import LOOP, SUCCESS
import datetime

class DoSomething(Task):
    def run(self, value):
        sleep(1)
        <http://self.logger.info|self.logger.info>(value)

do_something = DoSomething()

with Flow("sub-flow") as sub_flow:
    one_date = Parameter("one_date")
    do_something(one_date)

sub_flow.register("aws")

sub_flow_task = StartFlowRun(project_name="aws", flow_name="sub-flow", wait=True)

@task()
def loop_over_dates(dates):
    # Starting state
    loop_payload = prefect.context.get("task_loop_result", {"dates": dates})
    dates = loop_payload.get("dates", [])

    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>(dates)

    one_date = dates[0]

    <http://logger.info|logger.info>(f"Checking {one_date}")
    
    try:
        sub_flow_task.run(parameters={"one_date": one_date}, 
        idempotency_key=datetime.datetime.now().strftime("%m/%d/%Y, %H:%M:%S"))
    except SUCCESS:
        # Don't exit the loop on Flow Run success
        pass

    # Drop the first date
    dates.pop(0)

    if len(dates) == 0:
        return  # return statements end the loop
    raise LOOP(message=f"Processing {dates[0]}", result=dict(dates = dates))

with Flow("schedule flow") as schedule_flow:
    date_param = Parameter("dates", default=[1,2,3,4,5])
    loop_over_dates(date_param)

schedule_flow.register("aws")

Kevin Kho

04/27/2021, 10:44 PM

Thanks for your patience!

3 Views

Open in Slack

Previous Next