Hey all, I'm working on starting what's essentially a data warehouse at my company, and thinking of using prefect to schedule and orchestrate all the ETL. We're going to have a number of 3rd party data sources to pull data from on various schedules, and then we'll likely want to schedule some transformations after certain combinations of tables are finished loading each day. I'm trying to get the project architecture off on the right foot and have been trying out prefect for a couple days, and I'm wondering how I should think about organizing my Flows.
I'm wondering, should I have one flow per data source, or one flow for the whole pipeline?
My dilemma is that each data source is going to have its own schedule, which leads me to have one Flow per source, but if I want to trigger transformations based on the completion of table loads, that feels like the flows are going to have dependencies on each other's completions and would be better off as one flow.
Thoughts? Any examples out there of similar projects?