Feliks Krawczyk
08/27/2019, 5:26 AMJeremiah
08/27/2019, 1:22 PMprefect.Parameter
). This may keep management sane.Feliks Krawczyk
08/28/2019, 12:18 AMWe might suggest that instead of having thousands of near-identical flows, you have a single flow with a parameterized input (have a look at prefect.Parameter). This may keep management sane.I’m not quite sure how this would work exactly? Each DAG I create has its own schedule, and we heavily utilise Airflow parameters. Although the DAGs themselves are “almost” identical in flow. The metadata within them is completely different (i.e schedules / number of steps etc). We also heavily utilise the “clear” functionality in airflow to re-run days which fail due to upstream issues. For more context what my actual service does is: It Materialises peoples SQL into tables within our Datalake. So instead of people querying massive raw tables for their reports (which isn’t scalable) we ask them to submit SQL that extracts a delta (usually daily) and append to their own tables that only contain the subset of data that they actually want. They then query these smaller tables
Chris White
08/28/2019, 1:12 AMFeliks Krawczyk
08/28/2019, 1:20 AMselect * from data where day = {%Y-%m-%d}
- Ease of re-running failures (clearing tasks and things kick off again)
I think you’ve given me enough to at least try a Proof of Concept.Chris White
08/28/2019, 1:25 AM