Let me know if I should trim more of the code to make it mor Prefect Community #ask-community

Let me know if I should trim more of the code to m...

An Hoang

07/28/2021, 3:47 PM

Let me know if I should trim more of the code to make it more of a minimal example. To deal with not executing an expensive task if the outputs are already persisted on disk, I'm trying to see if I should 1) raise the

SKIP

signal inside the task or 2) use conditional task

case

. I have provided the two scenarios on this gist https://gist.github.com/hoangthienan95/e0d8c3d73cb25f90f0d427c689ea80d8/revisions?diff=unified, along with the flow visualizations In the Gist, initial version is

SKIP

signal, and the second version are the modifications to make it use

case

. I have tested both of the flows, they work and both satisfy my requirements so far, so is there any caveats I haven't thought of or best practices that I should consider to choose one over the other?

An Hoang

07/28/2021, 3:49 PM

I have also included the flow visualizations here

flow using conditional task.pdf flow using skip signal.pdf

Kevin Kho

07/28/2021, 3:53 PM

Hey @An Hoang, so the

target

will not run the task if the file already exists and will load it from the file. If you want to manually check and decide to skip,

raise SKIP

is better inside a task.

case

would do it on the Flow level. Both will work. I think it’s just a matter of where you want that logic to take place (task or flow-level) I opened the links. It’s a bit hard to follow.

An Hoang

07/28/2021, 4:05 PM

sorry for the confusion, Kevin. I guess you got the gist of my question: should the logic take place at the task or flow level. These two are equivalent in every way right? I can see an argument for choosing one over the other if it simplifies the flow graph for end user. Anything else? Also I needed to do task A -> task B, -> task C if

case=False

(no result outputted), and task B -> task C if

case=True

, is the following code an anti-pattern of some sort?

Copy code

with case(cond, False): 
        task_A_result = task_A()
        task_B_result_false = task_B() #task_B loads a partial result from the file task_A outputted, but is not passed `task_A_result` 
        task_C_result_false= task_C(task_B_result_false)

    with case(cond, True): #file exist, skip A
        task_B_result_true = task_B() 
        task_C_result_true= task_C(task_B_result_true)

    task_C_result = merge(task_C_result_false, task_C_result_true)

Kevin Kho

07/28/2021, 4:09 PM

The output would be equivalent yeah. Just that SKIP propagates to downstream tasks also so you need to handle that, but I think I saw you did. I don’t think this is an anti-pattern, but from what I’m seeing, only

task_A()

needs to be inside the case right?

task_B()

and

task_C()

will run regardless so they don’t need to be part of the case statement?

An Hoang

07/28/2021, 4:28 PM

This was due to my unfamiliarity with Prefect, I put them in both places hoping to set

task A

upstream of

task B

and

task C

but I forgot I could do like below:

Copy code

with case(cond, False): 
    task_A_result = task_A()

task_B_result = task_B()
task_B_result.set_upstream(task_A_result)
task_C_result = task_C(task_B_result)

even when the

case

might evaluates to

True

. In

if/else

equivalent it would complain about the

task_A_result

variable not declared, but we are building a DAG that will be evaluated later 😄 This is one of the "gotchas" that would be a helpful addition to the beginner's tutorial or the

case

documentation 🙂

An Hoang

07/28/2021, 4:28 PM

Thanks @Kevin Kho for the help!

Kevin Kho

07/28/2021, 4:32 PM

Oh I didn’t think about setting upstream. That’s pretty cool if that works.

3 Views

Open in Slack

Previous Next