Aaron Ash
03/18/2022, 7:59 AMAnna Geller
with Flow(
FLOW_NAME,
executor=LocalDaskExecutor(),
storage=set_storage(FLOW_NAME),
run_config=set_run_config(local=True),
) as flow:
For the executor, this is more tricky, since as I said before, it's retrieved from storage, but you can try doing something like:
with Flow(
FLOW_NAME,
executor=LocalDaskExecutor(),
storage=set_storage(FLOW_NAME),
run_config=set_run_config(),
) as flow:
datasets = ["raw_customers", "raw_orders", "raw_payments"]
dataframes = extract_and_load.map(datasets)
if __name__ == '__main__':
# register for prod
flow.register("prod_project")
# register for dev
flow.executor = LocalExecutor()
flow.run_config = set_run_config(local=True)
flow.register("dev_project")
But I believe in the above the executor won't be respected since main is not evaluated at flow runtime.
So probably your best bet is to define your main flow in one python file, say: aaron_flow.py
- this defines your flow structure without defining run config or executor:
with Flow("FLOW_NAME", storage=S3(), # just example
) as flow:
datasets = ["raw_customers", "raw_orders"]
dataframes = extract_and_load.map(datasets)
Then, you can have a file called `aaron_flow_dev.py`:
from aaron_flow import flow
flow.executor = LocalExecutor()
flow.run_config = KubernetesRun(image="some_dev_image")
and aaron_flow_prod.py
from aaron_flow import flow
flow.executor = LocalDaskExecutor()
flow.run_config = KubernetesRun(image="some_prod_image")
and then you can register using CLI without worrying about the run config and executorAaron Ash
03/21/2022, 2:20 AM*_dev.py
modules looks like it's perfect for me