Christoph Deil

    Christoph Deil

    8 months ago
    I’ve started a Prefect tutorial to introduce it to my colleagues tomorrow: https://github.com/cdeil/prefect-tutorial It’s only half or less finished, but I thought I’d mention it in case someone is interested or even would like to collaborate on it. Specifically I’d be interested if there’s already a diagram or docs that explain with FlowRunner, TaskRunner etc when
    flow.run()
    with the default serial executor happens or where the algorithm is that linearises the task graph to decide who runs after who. And also how schedulers work, i.e. if they have at the very core some polling for loop in-process or if the scheduler runs in some other thread or process. For now I’m not planning to look into Dask, just trying to understand how the serial execution works under the hood. Does this information exist in some tutorial form? Or alternatively could you please point me to the few relevant parts in the code or tests to quickly understand how that works?
    Kevin Kho

    Kevin Kho

    8 months ago
    I would like to share @davzucky’s similar presentation here
    I honestly think FlowRunner and TaskRunner are classes that users should be concerned about. The FlowRunner is responsible for submitting tasks to the Executor
    By scheduler, do you mean the thing that submits tasks to the executor? Or the thing that schedules Flow runs? The DAG will be traversed but there is no specific ordering other than upstream going before downstream.
    Christoph Deil

    Christoph Deil

    8 months ago
    Did you mean “should be concerned about” or “should NOT be concerned about”?
    Kevin Kho

    Kevin Kho

    8 months ago
    Oh sorry. Should not *
    This is the image you would be more concerned about. It’s in the notebook of davzucky or

    here

    . This is an official Prefect image
    Christoph Deil

    Christoph Deil

    8 months ago
    Concerning your question what I mean with scheduler, the answer ist “I don’t know”. That’s exactly what I’m looking for: trying to understand which classes and method calls are involved, i.e. get an understanding of how Prefect works under the hood. I tried running simple examples through the debugger in Pycharm, but it’s very difficult to follow the execution and understand what’s going on (maybe just for me). I’m off now actually, but I’ll study https://github.com/davzucky/prefect_presentation and also continue with Prefect code and docs reading tomorrow. Thanks!
    Kevin Kho

    Kevin Kho

    8 months ago
    Of course, just ping tom and someone will answer you 🙂
    Last bit, this community answer is very good
    Gus Cavanaugh

    Gus Cavanaugh

    8 months ago
    Thanks for the Coiled shout-out on your compute bullet @Christoph Deil!
    davzucky

    davzucky

    8 months ago
    @Christoph Deil This is really complete. Who is your audience ? What knowledge does they have on Prefect ? For 90% of the user they only have to worried about how to build the DAG (Flow) using Task. • Orchestration like using cron and bash to start your flow or server vs cloud are implementation • task runner liky using local vs dask is the same. Most of my team members don't know the complexity that happen behind. This is why I like about it Some of the point to highlight are: • Code really similar to a procedural. Just decor your task with
    @task
    • Developer focus on business • Simple to scale out with Dask • Getting inside from Prefect Server/Cloud • Take care of the result persistance
    Christoph Deil

    Christoph Deil

    8 months ago
    I was actually mainly trying to understand how Prefect works under the hood, what exactly happens in the
    with Flow
    and
    flow.run()
    execution, how the DAG is created and executed. So my target audience was just the curious Python dev that likes to fully understand the tools he uses. I know the strength of Prefect is that you don’t have to understand what’s happening under the hood, you can just learn the nice API to make tasks and flows. But for that part there’s already https://docs.prefect.io/ and other great content, I don’t think it would be useful if I write a similar tutorial on that. So I gave the tutorial today for my colleagues who were mostly seasoned Python developers but Prefect newbies like myself, and it went well, we learned a bunch of stuff together, even if the part to understand what happens under the hood is still mostly to be learned at a later time. We’ll sleep on it, but mostly opinion was that we prefer Prefect Orion and for our limited use of existing flows (just running on a schedule, no server or cloud) we’d try already migrating to it, even if it’s alpha. 🙂
    Anna Geller

    Anna Geller

    8 months ago
    When you register your flow, Prefect constructs the DAG and stores metadata about your flow, incl. your tasks and edges, i.e. dependencies between tasks. When you run your flow, Prefect then walks the computational graph and executes those tasks in the order you specified while making sure that task dependencies (and other execution conditions such as retries, caching, results) are met. Prefect then manages the state transitions and stores the task run and flow run states in the backend. You don’t need to know FlowRunner, but you need to understand the difference between what happens at registration time (i.e. build time) and what happens at runtime. And also understand that flow and task are templates for flow runs and task runs.
    Christoph Deil

    Christoph Deil

    8 months ago
    There was one gotcha that we ran into during the tutorial. I filed an issue: https://github.com/PrefectHQ/prefect/issues/5301 Thank you all for the tips!
    Anna Geller

    Anna Geller

    8 months ago
    You’re very welcome! I answered on Github how you can solve this issue
    btw, you can do something like this to preview the next schedules to avoid any surprises:
    from datetime import timedelta
    import pendulum
    from prefect import task, Flow
    from prefect.schedules import IntervalSchedule
    
    
    @task
    def say_hello():
        print("Hello, world!")
    
    
    schedule = IntervalSchedule(
        start_date=pendulum.now(tz="Europe/Berlin"), interval=timedelta(minutes=1),
    )
    
    
    for sched in schedule.next(20):
        print(sched)