Let me know if I should trim more of the code to m...
# ask-community
a
Let me know if I should trim more of the code to make it more of a minimal example. To deal with not executing an expensive task if the outputs are already persisted on disk, I'm trying to see if I should 1) raise the
SKIP
signal inside the task or 2) use conditional task
case
. I have provided the two scenarios on this gist https://gist.github.com/hoangthienan95/e0d8c3d73cb25f90f0d427c689ea80d8/revisions?diff=unified, along with the flow visualizations In the Gist, initial version is
SKIP
signal, and the second version are the modifications to make it use
case
. I have tested both of the flows, they work and both satisfy my requirements so far, so is there any caveats I haven't thought of or best practices that I should consider to choose one over the other?
I have also included the flow visualizations here
k
Hey @An Hoang, so the
target
will not run the task if the file already exists and will load it from the file. If you want to manually check and decide to skip,
raise SKIP
is better inside a task.
case
would do it on the Flow level. Both will work. I think it’s just a matter of where you want that logic to take place (task or flow-level) I opened the links. It’s a bit hard to follow.
a
sorry for the confusion, Kevin. I guess you got the gist of my question: should the logic take place at the task or flow level. These two are equivalent in every way right? I can see an argument for choosing one over the other if it simplifies the flow graph for end user. Anything else? Also I needed to do task A -> task B, -> task C if
case=False
(no result outputted), and task B -> task C if
case=True
, is the following code an anti-pattern of some sort?
Copy code
with case(cond, False): 
        task_A_result = task_A()
        task_B_result_false = task_B() #task_B loads a partial result from the file task_A outputted, but is not passed `task_A_result` 
        task_C_result_false= task_C(task_B_result_false)

    with case(cond, True): #file exist, skip A
        task_B_result_true = task_B() 
        task_C_result_true= task_C(task_B_result_true)

    task_C_result = merge(task_C_result_false, task_C_result_true)
k
The output would be equivalent yeah. Just that SKIP propagates to downstream tasks also so you need to handle that, but I think I saw you did. I don’t think this is an anti-pattern, but from what I’m seeing, only
task_A()
needs to be inside the case right?
task_B()
and
task_C()
will run regardless so they don’t need to be part of the case statement?
a
This was due to my unfamiliarity with Prefect, I put them in both places hoping to set
task A
upstream of
task B
and
task C
but I forgot I could do like below:
Copy code
with case(cond, False): 
    task_A_result = task_A()

task_B_result = task_B()
task_B_result.set_upstream(task_A_result)
task_C_result = task_C(task_B_result)
even when the
case
might evaluates to
True
. In
if/else
equivalent it would complain about the
task_A_result
variable not declared, but we are building a DAG that will be evaluated later 😄 This is one of the "gotchas" that would be a helpful addition to the beginner's tutorial or the
case
documentation 🙂
Thanks @Kevin Kho for the help!
k
Oh I didn’t think about setting upstream. That’s pretty cool if that works.