Feliks Krawczyk
08/27/2019, 5:26 AMJeremiah
prefect.Parameter
). This may keep management sane.Feliks Krawczyk
08/28/2019, 12:18 AMWe might suggest that instead of having thousands of near-identical flows, you have a single flow with a parameterized input (have a look at prefect.Parameter). This may keep management sane.I’m not quite sure how this would work exactly? Each DAG I create has its own schedule, and we heavily utilise Airflow parameters. Although the DAGs themselves are “almost” identical in flow. The metadata within them is completely different (i.e schedules / number of steps etc). We also heavily utilise the “clear” functionality in airflow to re-run days which fail due to upstream issues. For more context what my actual service does is: It Materialises peoples SQL into tables within our Datalake. So instead of people querying massive raw tables for their reports (which isn’t scalable) we ask them to submit SQL that extracts a delta (usually daily) and append to their own tables that only contain the subset of data that they actually want. They then query these smaller tables
Chris White
Feliks Krawczyk
08/28/2019, 1:20 AMselect * from data where day = {%Y-%m-%d}
- Ease of re-running failures (clearing tasks and things kick off again)
I think you’ve given me enough to at least try a Proof of Concept.Chris White