Hi, I want to post here in case someone has similar issue with Prefect
task.map() and random selection.
I have a task which contains random sampling (example below) and the random seed is set for replication purpose. If the task is
mapped to multiple config_dict, the result is not consistent - each run will generate different results for the same config and the results are not replicable. If I run the task
in a loop (over the multiple config_dict), the result is consistent each time as expected.
I am suspecting task.map() starts multi-threads and np.random.seed may not be thread-safe? What's the underlying reason and what's the best practice in this case?
@task
def down_sample(config_dict):
for k in range(5):
np.random.seed(seed=k)
scaling_factors = np.random.choice(
f_vector,
p=w_vector,
size=self.sample_size * self.sample_size,
replace=True,)