I ve started a Prefect tutorial to introduce it to my collea Prefect Community #show-and-tell

I’ve started a Prefect tutorial to introduce it to...

Christoph Deil

01/04/2022, 9:15 PM

I’ve started a Prefect tutorial to introduce it to my colleagues tomorrow: https://github.com/cdeil/prefect-tutorial It’s only half or less finished, but I thought I’d mention it in case someone is interested or even would like to collaborate on it. Specifically I’d be interested if there’s already a diagram or docs that explain with FlowRunner, TaskRunner etc when

flow.run()

with the default serial executor happens or where the algorithm is that linearises the task graph to decide who runs after who. And also how schedulers work, i.e. if they have at the very core some polling for loop in-process or if the scheduler runs in some other thread or process. For now I’m not planning to look into Dask, just trying to understand how the serial execution works under the hood. Does this information exist in some tutorial form? Or alternatively could you please point me to the few relevant parts in the code or tests to quickly understand how that works?

Kevin Kho

01/04/2022, 9:16 PM

I would like to share @davzucky’s similar presentation here

❤️ 1

Kevin Kho

01/04/2022, 9:18 PM

I honestly think FlowRunner and TaskRunner are classes that users should be concerned about. The FlowRunner is responsible for submitting tasks to the Executor

Kevin Kho

01/04/2022, 9:20 PM

By scheduler, do you mean the thing that submits tasks to the executor? Or the thing that schedules Flow runs? The DAG will be traversed but there is no specific ordering other than upstream going before downstream.

Christoph Deil

01/04/2022, 9:23 PM

Did you mean “should be concerned about” or “should NOT be concerned about”?

Kevin Kho

01/04/2022, 9:23 PM

Oh sorry. Should not *

Kevin Kho

01/04/2022, 9:25 PM

This is the image you would be more concerned about. It’s in the notebook of davzucky or

here▾

. This is an official Prefect image

Christoph Deil

01/04/2022, 9:27 PM

Concerning your question what I mean with scheduler, the answer ist “I don’t know”. That’s exactly what I’m looking for: trying to understand which classes and method calls are involved, i.e. get an understanding of how Prefect works under the hood. I tried running simple examples through the debugger in Pycharm, but it’s very difficult to follow the execution and understand what’s going on (maybe just for me). I’m off now actually, but I’ll study https://github.com/davzucky/prefect_presentation and also continue with Prefect code and docs reading tomorrow. Thanks!

Kevin Kho

01/04/2022, 9:27 PM

Of course, just ping tom and someone will answer you 🙂

Kevin Kho

01/04/2022, 9:28 PM

Last bit, this community answer is very good

Gus Cavanaugh

01/04/2022, 9:50 PM

Thanks for the Coiled shout-out on your compute bullet @Christoph Deil!

davzucky

01/05/2022, 12:26 AM

@Christoph Deil This is really complete. Who is your audience ? What knowledge does they have on Prefect ? For 90% of the user they only have to worried about how to build the DAG (Flow) using Task. • Orchestration like using cron and bash to start your flow or server vs cloud are implementation • task runner liky using local vs dask is the same. Most of my team members don't know the complexity that happen behind. This is why I like about it Some of the point to highlight are: • Code really similar to a procedural. Just decor your task with

Copy code

@task

• Developer focus on business • Simple to scale out with Dask • Getting inside from Prefect Server/Cloud • Take care of the result persistance

Christoph Deil

01/05/2022, 2:42 PM

I was actually mainly trying to understand how Prefect works under the hood, what exactly happens in the

with Flow

and

flow.run()

execution, how the DAG is created and executed. So my target audience was just the curious Python dev that likes to fully understand the tools he uses. I know the strength of Prefect is that you don’t have to understand what’s happening under the hood, you can just learn the nice API to make tasks and flows. But for that part there’s already https://docs.prefect.io/ and other great content, I don’t think it would be useful if I write a similar tutorial on that. So I gave the tutorial today for my colleagues who were mostly seasoned Python developers but Prefect newbies like myself, and it went well, we learned a bunch of stuff together, even if the part to understand what happens under the hood is still mostly to be learned at a later time. We’ll sleep on it, but mostly opinion was that we prefer Prefect Orion and for our limited use of existing flows (just running on a schedule, no server or cloud) we’d try already migrating to it, even if it’s alpha. 🙂

Anna Geller

01/05/2022, 3:23 PM

When you register your flow, Prefect constructs the DAG and stores metadata about your flow, incl. your tasks and edges, i.e. dependencies between tasks. When you run your flow, Prefect then walks the computational graph and executes those tasks in the order you specified while making sure that task dependencies (and other execution conditions such as retries, caching, results) are met. Prefect then manages the state transitions and stores the task run and flow run states in the backend. You don’t need to know FlowRunner, but you need to understand the difference between what happens at registration time (i.e. build time) and what happens at runtime. And also understand that flow and task are templates for flow runs and task runs.

❤️ 1

Christoph Deil

01/05/2022, 4:04 PM

There was one gotcha that we ran into during the tutorial. I filed an issue: https://github.com/PrefectHQ/prefect/issues/5301 Thank you all for the tips!

Anna Geller

01/05/2022, 4:27 PM

You’re very welcome! I answered on Github how you can solve this issue

Anna Geller

01/05/2022, 4:29 PM

btw, you can do something like this to preview the next schedules to avoid any surprises:

Copy code

from datetime import timedelta
import pendulum
from prefect import task, Flow
from prefect.schedules import IntervalSchedule


@task
def say_hello():
    print("Hello, world!")


schedule = IntervalSchedule(
    start_date=pendulum.now(tz="Europe/Berlin"), interval=timedelta(minutes=1),
)


for sched in schedule.next(20):
    print(sched)

👍 1

8 Views

Open in Slack

Previous Next