Hi folks I m converting my code base to use orion instead of Prefect Community #ask-community

Hi folks! I'm converting my code base to use orion...

Trevor Campbell

11/25/2022, 8:11 PM

Hi folks! I'm converting my code base to use orion instead of v1, and I noticed that the ability to raise a SKIP signal (or something like that) is missing. In my original code, I have a sequence of tasks

A -> B -> C -> D

. Any of them can raise a SKIP signal, and if that happens, I want to skip all downstream tasks. It isn't a failure or a success, it's more of a "I'm not ready to run this yet, so don't run anything that depends on my output yet" Is that possible to do in Orion? I saw one earlier thread here about it, but the outcome was inconclusive... • one option is just to return a cancelled state, but that seems to suggest failure (which in my case would prompt a message to the admin, which I definitely don't want to happen for SKIPs. SKIPs happen very often in my particular case -- far more common than any other outcome) • another is to return a completed state, but then I need annoying

if

statements everywhere checking the outcome of previous tasks (skip vs. was actually run successfully). Actually, the whole reason I started using Prefect in the first place was for its ability to easily control flows where things get skipped 😉

✅ 1

Taylor Curran

11/28/2022, 2:21 PM

Hi Trevor, in this blog post, Anna provides simple examples for thinking about this kind of logic: https://medium.com/the-prefect-blog/modular-data-stack-build-a-data-platform-with-prefect-dbt-and-snowflake-part-4-21a7d0a5b2c3#:~:text=[…]%20Python

Taylor Curran

11/28/2022, 2:25 PM

Also in Part 5 of the blog series, Anna shows a screenshot of a partially successful flow with many tasks finishing with a

NotReady

state, which might be the state you are looking for?

Anna Geller

11/28/2022, 5:44 PM

a good way to think about it is: how would you build it in plain Python without Prefect? in v2, you can write if/else so there is no need for SKIP signals. It was necessary in v1 because such if/else wasn't possible

Trevor Campbell

11/28/2022, 5:58 PM

@Taylor Curran thanks for the link to the blog post! I read through the two pages you sent, and did indeed see the

NotReady

state. I found that

NotReady

is a

PENDING

state in

src/prefect/orion/schemas/states.py

Copy code

class StateType(AutoEnum):
    """Enumeration of state types."""

    SCHEDULED = AutoEnum.auto()
    PENDING = AutoEnum.auto()
    RUNNING = AutoEnum.auto()
    COMPLETED = AutoEnum.auto()
    FAILED = AutoEnum.auto()
    CANCELLED = AutoEnum.auto()
    CRASHED = AutoEnum.auto()
    PAUSED = AutoEnum.auto()


TERMINAL_STATES = {
    StateType.COMPLETED,
    StateType.CANCELLED,
    StateType.FAILED,
    StateType.CRASHED,
}

It looks like I shouldn't be using

PENDING

explicitly, since I imagine that will muck with orchestration, since it isn't a terminal state? I would think the closest to what I need is

CANCELLED

, but I think that counts as a failure...perhaps I need to test that though. @Anna Geller thanks for your response! My problem is that my code will start to look quite messy if I use the usual

if/else

stuff:

Copy code

A()
if A didn't skip:
   B()
   if B didn't skip:
      C()
      if C didn't skip:
          D()

(not to mention that this

A->B->C->D

thing is very simplified compared to what I'm actually doing -- my actual code will be a disaster if I try to if/else everything)

Trevor Campbell

11/28/2022, 5:59 PM

being able to explicitly say SKIP and then all tasks downstream auto-SKIP as well was a very useful feature. kind of hoping to replicate that somehow. My "bandaid" for now was to include a boolean in the output object:

Copy code

a = A()
b = B(a)
c = C(b)
d = D(c)

where the objects

a, b, c, d

have a

.skip

boolean attribute, and then the functions

A,B,C,D

set

(--).skip = true

internally. Then at the beginning of each task, it just checks if its own input has the

.skip

flag set, and if so, returns immediately with the

.skip=true

on the output too. Bit of a hack but at least it avoids the crazy amount of if statements (my dependency graph of tasks isn't a chain, it's more of a tree...)

Anna Geller

11/28/2022, 6:07 PM

this looks quite complicated, why do you need this skipping? can you explain the problem you are trying to solve? there is certainly easier way

Anna Geller

11/28/2022, 6:07 PM

is this some event-driven use case?

Trevor Campbell

11/28/2022, 6:07 PM

probably is an easier way 😄 let me try to explain

Anna Geller

11/28/2022, 6:07 PM

"I'm not ready to run this yet" sounds like the sensor pattern I wrote about here https://medium.com/the-prefect-blog/scheduled-vs-event-driven-data-pipelines-orchestrate-anything-with-prefect-b915e6adc3ba

Trevor Campbell

11/28/2022, 6:14 PM

the sensor pattern looks super useful. definitely applicable to my case, i'll have to give that a read today. here's what I'm doing (a slight simplification, but this should give you the gist): I'm using prefect to run orchestration in my large undergraduate class. Basically, I have a flow that runs once per day: • loops over all assignments • checks whether the deadline has passed & student submission snapshot exists for that assignment • collects assignments (jupyter notebooks) and does some basic cleaning on them • generates solutions and distributes them to students • runs the autograder • notifies TAs about manual grading tasks; waits for the TAs to finish their grading • generates feedback forms • uploads grades to our LMS system • returns the feedback forms to students All of these things are probably best thought of as event-driven -- we should only collect assignments once the snapshot exist, we only autograde once the submissions are collected, etc etc. We do this currently by just running the flow once per day and each task checks whether it's ready to go. Otherwise it skips, and then all downstream stuff should skip too. Note that some of these "events" are from an online learning management system, some are from the local filesystem, some are on a remote filesystem, etc. I have a plan to make my tasks into

make

-like tasks that check some notion of "timestamp" of input/output and re-run tasks when the output is older than the input, but that's something for later...

Anna Geller

11/28/2022, 7:47 PM

thanks, this is extremely useful context and there are so many ways to approach it I believe given this requirement: "notifies TAs about manual grading tasks; waits for the TAs to finish their grading" you may benefit from the pause() and resume functionality we are currently building. We plan to release it this Thursday and it will roughly look like this:

Copy code

@task
def my_task():
  ...

@flow
def my_flow():
  ...
	prefect.utilities.pause(timeout: int = N)
	data = my_task()
	...

👍 1

Anna Geller

11/28/2022, 7:48 PM

and then once the TA finishes grading, they can click "Resume" button from the UI to continue processing the workflow

Anna Geller

11/28/2022, 7:48 PM

it seems that no SKIP signal is needed but it requires some thought to design it, I believe waiting for the Thursday release would be prudent since it will vastly simplify the use case for you

panda dancing 1

P 1

2 Views

Open in Slack

Previous Next