https://prefect.io logo
Title
r

Ron Van Buskirk

11/20/2019, 3:43 PM
Prefect is such an elegant framework -- it's amazing! I've created small pipelines but would appreciate any suggestions on designing a larger one. It builds 150+ Postgres tables in 20-30 hours, there are multiple layers of tables, and tables in one layer depend on one or more tables from the previous. I'd like to create a Flow where the DAG specifies the table dependencies, then builds a table only if (1) it doesn't exist or (2) is older than its parent table(s). I thought of different solutions:
* having separate tasks for building each table (450+ tasks!):
 Check_table1 -> ifelse -> Build_table1 |-> Check_table2 -> ifelse  -> Build_table2 ...
                                        |-> Check_table3 -> ifelse  -> Build_table3 ...
							            |-> Check_table4 -> ifelse  -> Build_table4 ...
							  
* subclassing the Postgres execute task to create a check-and-build task:
 Check_and_build_table1  |-> Check_and_build_table2 ...
                         |-> Check_and_build_table3 ...
						 |-> Check_and_build_table4 ...
  
* having a small number of tasks (check timestamp and existence, conditional, build) and using the map function to iterate the building of the required tables:
 Check_table1 -> ifelse -> Build_table1 |-> Check_table.map(x) -> ifelse -> Build_table(x) ...
                                        ...
							            ...
Still really new to Prefect... are any of these any good? Are there any other best practices I should consider?
👀 1
z

Zachary Hughes

11/20/2019, 4:03 PM
Hi @Ron Van Buskirk, glad to hear you're enjoying Prefect so much! Honestly, each of the solutions you floated is solid and should work, but my inclination is to recommend the mapped task approach as the most intuitive choice.
r

Ron Van Buskirk

11/20/2019, 6:33 PM
Thanks, @Zachary Hughes! I'll go with that one then and see what happens 👍