Thread
#prefect-community
    Marc Lipoff

    Marc Lipoff

    1 year ago
    Hi everyone. Does any one have experience/examples of getting a large response from a http get call, and writing to s3? Prefect has the map functionality, which seems like a good way to do this. If I were not using prefect, I'd do something like this:
    s3_client = boto3.client('s3')
    
    with requests.request("GET", url, stream=True) as r:
         r.raise_for_status()
    
         for chunk in r:
            s3_client.put_object(Body=chunk,
                                 Bucket='xxx',
                                 Key='xxx')
    s

    Sébastien

    1 year ago
    If you want to fully parallelize, you'll need to fetch the size beforehand, split it into chunks and run a map on the chunk parts (create a separate stream
    url<>S3
    for each chunk). The
    requests.request
    is a single streaming request. That single stream can't be turned into pieces without limiting yourself to I/O throughput on the initial worker (AFAIK). If you want to keep it simpler yet still speed it up, you should be able to use
    requests-futures
    to make it async and run on multiple OS threads (which, in turn, would not guarantee chunk ordering so make sure you reconstruct it properly once the whole object is streamed).
    Marc Lipoff

    Marc Lipoff

    1 year ago
    ok awesome. what would be the best way to reconstruct it?
    s

    Sébastien

    1 year ago
    Manually, step by step, by first modeling your solution before coding it and making sure it fulfills your needs.