Tom Klein
07/13/2022, 10:38 AMAnna Geller
07/13/2022, 10:58 AMTom Klein
07/13/2022, 11:03 AM500K -> 50 batches of 10K -> 10 batches of 1K
so that the batches of 1K are “sub-tasks” of each “task” of 10K
the “task” of 10K finishes when all its “sub-tasks” of 1K finish (and are combined, and the combined file is uploaded to S3)
is there a way to apply this kind of heirarchical mapping in Prefect?Anna Geller
07/13/2022, 11:20 AMTom Klein
07/13/2022, 11:21 AMAnna Geller
07/13/2022, 11:41 AMTom Klein
07/13/2022, 11:55 AMwrapped_enrich_shell_task
that was applied on it, and each such task yielded an output of size 1K rows
then, each output (of size 1K) had the task wrapped_s3_upload_task
applied on it
in Prefect it looked like:
saved = save_as_csv(MAIN_INPUT_FILE_PATH, accounts_data)
chunks = split_csv(file=saved, rowsize=1000)
run_outputs = wrapped_enrich_shell_task.map(chunks)
my_uploads = wrapped_s3_upload_task.map(run_outputs)
now i changed the code so that it combines the outputs into a single CSV, but i prefer it to happen in batches of 5K or 10K, NOT to wait for the entire 58 batches to finish
saved = save_as_csv(MAIN_INPUT_FILE_PATH, accounts_data)
chunks = split_csv(file=saved, rowsize=1000)
run_outputs = wrapped_enrich_shell_task.map(chunks)
combined = combine_csvs(run_outputs)
my_uploads = wrapped_s3_upload_task(combined)
Anna Geller
07/13/2022, 11:56 AMTom Klein
07/13/2022, 12:00 PM1 -> N -> M ->…
and so on
where M is larger than or equal to N
if i have 1-> N -> M
where M is LESS than N (i.e. - a reduce) i don’t see how it’s possible? how would prefect know how to reduce only “some” of the above layer?Anna Geller
07/13/2022, 12:10 PM