Does anybody have "How Prefect Works (TM)" diagram...
# ask-community
n
Does anybody have "How Prefect Works (TM)" diagrams (or videos or blogs)? I'm thinking something like https://docs.prefect.io/orchestration/server/architecture.html but more granular -- I'm trying to understand what's actually happening when a Flow is scheduled to run in Server, including how the Agent communicates with the Executor. Something like "A Flow/Task, from Start to Finish" would be mega helpful. I'm piecing things together from the docs and code, but if there's a hand-holding walkthrough or diagram available, I'm all ears.
j
Hi Nathan, We don't (currently) have anything like this in our docs, but it's been on the top of my todo list. We're hoping to have a docs sprint sometime in Q2, so hopefully soon-ish.
I'm happy to answer any specific questions you have in the meantime.
n
Ah, OK, well thank you for being available to answer questions! My first question (please correct me if I'm misunderstanding something): Let's say I'm running Server locally and have a local Docker Agent running (local Docker, local Repository, all the locals). When the Agent gets word that it should run MyLovelyFlow, it gets the appropriate MyLovelyFlow image from the Repository and spins up a Docker container to run the Flow. Eventually, something calls MyLovelyFlow.run(...) in the container and magic happens. What happens between container startup and MyLovelyFlow.run(...)?
j
The container executes
prefect execute flow-run
, this will fetch your flow from storage (e.g. pull from S3, github, load from within the docker image, etc...). The flow then runs until completion, and the container shutsdown,
n
Excellent! Thank you. This is the short version of what I think happens next: During Flow.run(...), Flow._run(...) creates a while not flow_state.is_finished() loop of the following. The Flow has a FlowRunner (eventually) run all the Tasks with FlowRunner.run(...). This eventually calls the Executor to start with a executor.start(...) context manager. The FlowRunner then loops once through each Task in order using FlowRunner.flow.sorted_tasks(), doing the following. For each Task, the Executor submits the Task with the run_task function using executor.submit(run_task, ...). This starts the TaskRunner, which eventually has the Executor run the task with prefect.utilities.executors.run_task_with_timeout(...) that eventually calls task.run(...), which is the thing I defined by putting the @task decorator over a function definition in my original .py file. First, does that seem right? Second, does the loop for self.flow.sorted_tasks() just kick off a Task in the Executor one after another (returning state = Running or something), or does it wait for each one to finish? If it only starts the Task, is this the purpose of the loop waiting on flow_state.is_finished() -- to just keep looping until all the Tasks are finished?
j
You're getting a bit in the weeds here, I'd define most of the above as implementation details (we might change them any day, and you as a user shouldn't need to care about them). But yes, that all sounds about right. Tasks are submitted to the executor but not awaited until we need the concrete results.
Is there a particular reason you're digging this deeply into the code? For a user all you should need to know is that your tasks are executed by the executor (so e.g. if running on a
DaskExecutor
, the tasks execute in the worker's environments).
n
I think I got so far into the weeds because I'm trying to understand how to best use the Imperative API. Specifically, I'm trying to understand how Prefect thinks about upstream/downstream Tasks at runtime. I had an issue where I was misusing the upstream_task variable in the Functional API and wanted to understand what I was doing wrong and why things seemed to work with the Imperative API. In any case, thank you for your help and time here. I'm really enjoying learning about Prefect.