Aaron Gonzalez
02/16/2023, 10:53 PMgsutil rsync <s3://some-key/dt=yyyy-mm-dd/> <gs://some-key/dt=yyyy-mm-dd/> 😢
I am going to give prefect-shell a try for the first time and want to know if people have had a lot of experience with it?
For my use case I have about 12K different rsync's I am going to need to run and I don't know which of these patterns is preferable:
for src in s3_sources_12k:
    dest = f'<gs://some-dest/{src}>'
    ShellOperation(
        commands=[f"gsutil rsync -r {src} {dest}"],
        env=env_var_map,
    ).run()
or
with ShellOperation(
    commands=[
        "gsutil rsync -r src1 dest1",
        "gsutil rsync -r src2 dest2",
        "gsutil rsync -r src3 dest3",
        ...
        "gsutil rsync -r src12k dest12k",
    ],
    env=env_var_map,
) as shell_operation:
    shell_process = shell_operation.trigger()
    shell_process.wait_for_completion()
    shell_output = shell_process.fetch_result()Aaron Gonzalez
02/16/2023, 10:55 PMFor long-lasting operations, use the trigger method and utilize the block as a context manager for automatic closure of processes when context is exitedBut I don't know if "long-lasting" refers to the total amount of iterations I might need to make or the amount of time each one is going to take (probably not that long because the data is pretty small).
Andrew Huang
02/16/2023, 11:55 PM@task
def sync_src(src):
    dest = f'<gs://some-dest/{src}>'
    ShellOperation(
        commands=[f"gsutil rsync -r {src} {dest}"],
        env=env_var_map,
    ).run()
    return
for src in s3_sources_12k:
    sync_src(src)Andrew Huang
02/16/2023, 11:55 PMAaron Gonzalez
02/17/2023, 2:50 PMAndrew Huang
02/17/2023, 8:16 PMAaron Gonzalez
02/17/2023, 8:27 PM