When building a pipeline with multiple but different steps E Prefect Community #ask-community

When building a pipeline with multiple, but differ...

Bryan Whiting

01/14/2020, 9:12 PM

When building a pipeline with multiple, but different steps (ETL, modeling, scoring) is it recommended to have one long flow or multiple smaller flows? If the latter, is it common practice to import flows from other files to a “main.py” where I’d run all my flows at once? Some flows are long, some are short. Curious how you think about this. Maybe I’m not coding it well enough, because if I understood prefect well enough I’d be able to leverage all the features that make it easy to run my Flows from specific starting points.

Jackson Maxfield Brown

01/14/2020, 9:40 PM

Curious as to the answer to this as well. Currently, we split large flows up by two methods, if the flow has different computation requirements (i.e. GPU vs CPU) but we also split it up when it is a different unit of computation. Second one is a bit vague and is just our way of saying "this function works and feels really nice at processing this level of granularity and we shouldn't try to change it anymore"

Jackson Maxfield Brown

01/14/2020, 9:41 PM

But input from others and discussion in general on this topic would be awesome

josh

01/14/2020, 10:27 PM

Honestly I don’t think there is a recommended side on the large vs small flow debate (and I welcome others to chime in). It really comes down to your personal preference! On the notion of running multiple flows from a main.py you could definitely do that if you want that to be your setup. I would like to add that if your Flows are running on schedules then each flow should be run in its own process. Alternatively you could check out Prefect Cloud if you want a way to manage multiple flows and deploy them asynchronously onto a platform such as kubernetes, fargate, etc.

Bryan Whiting

01/15/2020, 6:29 PM

Cool, thanks for the input! Just running things locally for now

2 Views

Open in Slack

Previous Next