Josh Greenhalgh
01/27/2021, 5:24 PMp1_res = extract_raw_from_api.map(
asset_location=asset_locations,
extract_date=unmapped(extract_date),
extract_period_days=unmapped(extract_period_days),
)
p2_res = process_2.map(p1_res)
p3_res = process_3.map(p2_res)
Then as soon as task 1 from p1_res is done then the corresponding task should start in process_2?
As it stands no process_2 tasks start until many of the extract_raw_from_api tasks are already complete...josh
01/27/2021, 5:31 PMJosh Greenhalgh
01/27/2021, 5:31 PMjosh
01/27/2021, 5:32 PMJosh Greenhalgh
01/27/2021, 5:33 PMJosh Greenhalgh
01/27/2021, 5:33 PMjosh
01/27/2021, 5:33 PMJosh Greenhalgh
01/27/2021, 5:34 PMJosh Greenhalgh
01/27/2021, 5:34 PMJim Crist-Harif
01/27/2021, 5:35 PMDaskExecutor
) to get parallelization, but you do need to use something other than the default LocalExecutor
. If your tasks are all io bound (for example, starting and monitoring external k8s jobs), you might find the LocalDaskExecutor
sufficient. This uses a local thread pool to run tasks in parallel.Jim Crist-Harif
01/27/2021, 5:35 PMJosh Greenhalgh
01/27/2021, 5:36 PMJosh Greenhalgh
01/27/2021, 5:37 PMJim Crist-Harif
01/27/2021, 5:37 PMJosh Greenhalgh
01/27/2021, 5:38 PMJosh Greenhalgh
01/27/2021, 5:43 PMwith Flow(name=name, storage=storage, run_config=run_config, schedule=schedule, executor = DaskExecutor()) as flow:
And now my mapped tasks don't even start I get a pickling error about max recursion depth 😞Jim Crist-Harif
01/27/2021, 6:09 PMJim Crist-Harif
01/27/2021, 6:10 PMexecutor=LocalDaskExecutor()
you won't get runtime pickling, since all your tasks run in the same process. This shouldn't be required though, the error you got above isn't expected).Josh Greenhalgh
01/27/2021, 6:13 PMJosh Greenhalgh
01/27/2021, 6:14 PM