Aaron Gonzalez
02/16/2023, 10:53 PMgsutil rsync <s3://some-key/dt=yyyy-mm-dd/>
<gs://some-key/dt=yyyy-mm-dd/>
😢
I am going to give prefect-shell
a try for the first time and want to know if people have had a lot of experience with it?
For my use case I have about 12K different rsync's I am going to need to run and I don't know which of these patterns is preferable:
for src in s3_sources_12k:
dest = f'<gs://some-dest/{src}>'
ShellOperation(
commands=[f"gsutil rsync -r {src} {dest}"],
env=env_var_map,
).run()
or
with ShellOperation(
commands=[
"gsutil rsync -r src1 dest1",
"gsutil rsync -r src2 dest2",
"gsutil rsync -r src3 dest3",
...
"gsutil rsync -r src12k dest12k",
],
env=env_var_map,
) as shell_operation:
shell_process = shell_operation.trigger()
shell_process.wait_for_completion()
shell_output = shell_process.fetch_result()
For long-lasting operations, use the trigger method and utilize the block as a context manager for automatic closure of processes when context is exitedBut I don't know if "long-lasting" refers to the total amount of iterations I might need to make or the amount of time each one is going to take (probably not that long because the data is pretty small).
Andrew Huang
02/16/2023, 11:55 PM@task
def sync_src(src):
dest = f'<gs://some-dest/{src}>'
ShellOperation(
commands=[f"gsutil rsync -r {src} {dest}"],
env=env_var_map,
).run()
return
for src in s3_sources_12k:
sync_src(src)
Aaron Gonzalez
02/17/2023, 2:50 PMAndrew Huang
02/17/2023, 8:16 PMAaron Gonzalez
02/17/2023, 8:27 PM