https://prefect.io logo
m

Miroslav Rác

01/28/2022, 1:35 PM
Hi. I have a little problem I cannot wrap my head around. I have created a flow, nothing complicated just “hello world” from getting started docs. I deployed it to Prefect cloud. then started a local agent and tried to “quick run” the flow. everything works, which is great. but then I started a docker agent on my VPS (not locally on the machine where the flow has been created) and tried to run the flow. It wasn’t working. The agent picked out the flow but an error was raised immediately:
Failed to load and execute Flow's environment: ModuleNotFoundError("No module named '/Users/miro/'")
… Obviously, when I created my flow, my local path has been probably been pickled and so it cannot run on another machine. Is this expected behavior? How can I run flows created on a different machine?
a

Anna Geller

01/28/2022, 1:40 PM
This is indeed an intended behavior and to prevent such scenarios you can leverage other storage classes such as: • Git-storage classes (GitHub, Gitlab, Git, Bitbucket, CodeCommit) • Cloud storage classes (S3, GCS, Azure) • or a Docker storage that packages your flow with all other dependencies to a Docker image that gets pushed to a container registry of your choice during flow registration
To give you a bit longer explanation: When a flow is registered, Prefect stores the location of it in Storage ( 
GitHub
 , 
S3
 , 
Docker
 , etc.). During a flow run execution, Prefect pulls the flow from the storage location and runs it. If users don’t specify any storage, it defaults to a 
Local
 storage, which is a serialized version of the flow stored in the 
~/.prefect/flows
 folder. At runtime, the flow is retrieved from this file. The error you see happens when you use the default 
Local
 storage during the registration, and then you run the flow on a different machine (or a container) that doesn’t have the flow file (exactly as you noticed yourself).
m

Miroslav Rác

01/28/2022, 2:08 PM
thank you, this helps. I will try different storage. please let me ask you another question. I was also trying airflow (I am now deciding between the two) and there is a feature that each task can run on a different machine I wasn’t digging that deep but it seems like prefect is lacking the feature. seems like the whole flow must be run on the same machine (or agent)
a

Anna Geller

01/28/2022, 4:05 PM
You mean Celery executor? That’s true that we don’t have a load balancer or a queue for local agents, but we do have it for other agents. With KubernetesAgent, ECSAgent or VertexAgent your agent runs within one container/service and flow runs may be deployed on completely different nodes within a cluster. Similarly, if you use Prefect with Dask, you can scale your tasks across an entire Dask cluster that could be deployed as a standalone cluster, within a Kubernetes cluster or deployed using Dask as a service such as Coiled or Saturn Cloud.
This page explains it a bit more
The closest feature to this would be concurrency limits for flow runs and task runs. This can ensure that e.g. no more than 10 flows or tasks with a specific label/tag are running at the same time. If the maximum capacity limit is reached, Prefect will queue the task runs as described here
m

Miroslav Rác

01/28/2022, 7:40 PM
@Anna Geller I do not know if I understand, but I probably haven’t explained myself correctly. Please let me try once more time. Lets say we have a flow for a ETL job, but the part of T is augmentation while we have two kinds of augmentation. So we have tasks: 1. extract 2. transform 3. augment A 4. augment B (can run in parallel with A) 5. load But we have one machine which is optimized for augmentation of type A (task 3) and another machine which is optimized for augmentation of type B (task 4). So I need the flow to be executed on three different machines, where the third machine handles the non-augmentation tasks (1, 2, 5). How would you do it with Prefect?
maybe something like the labels on the task level instead of the flow level?
a

Anna Geller

01/28/2022, 7:46 PM
You could build a flow-of-flows (i.e. a parent flow triggering child flows) and let each child flow run on the machine it should by assigning proper labels on the run config. If you wanna learn more about this pattern, here are some resources you may check: • https://docs.prefect.io/core/idioms/flow-to-flow.htmlhttps://www.prefect.io/blog/flow-of-flows-orchestrating-elt-with-prefect-and-dbt/
m

Miroslav Rác

01/28/2022, 7:52 PM
Thanks. So I can create a flow, where tasks 3 and 4 will be
create_flow_run
looks like a good solution, thank you very much. Looks like we will opt for Prefect, I have better feeling about it. btw the blog article you have linked has the gists embeds blocked. but I think I get the point. thank you I appreciate your help
a

Anna Geller

01/28/2022, 7:57 PM
Do you happen to use Firefox? On Chrome the gists should work 100% 😄 sorry about that
m

Miroslav Rác

01/28/2022, 7:57 PM
I use chrome
And great to hear you have a good feeling about Prefect. Using flow of flows for this type of use case is quite common, so you are definitely not alone. We even have a special name for this type of use case - we call it “the orchestrator pattern” and it describes the pattern where a parent flow triggers child flow runs via API calls, allowing each flow run to be executed on an entirely different infrastructure.