What's the easiest way to specify to run N simulta...
# ask-community
j
What's the easiest way to specify to run N simultaneous extracts with Dask? E.x. each extract is like this:
Copy code
df_2015 = extract_past(connection, start_date="2015-01-01", end_date = "2015-12-31", task_args={"name": "Extract 2015"})
k
You might be able to map? Just the connection needs to go inside the task rather than passed in. Are these csv files or parquet?
j
reading out of a data warehouse
The full extract is too slow, so I'm splitting it apart, running some transforms in python, then running a loop for the load to incrementally load 1/10th of the total data
But i'd like to speed that extract component if possible
k
Ah then the best bet is just to map and create the connection inside the task. Map over a list of start dates and end dates. The limit would be the concurrent connections your warehouse allows
j
Gotcha, that makes sense, thank you!