Hi everyone. Does any one have experience/examples...
# prefect-community
m
Hi everyone. Does any one have experience/examples of getting a large response from a http get call, and writing to s3? Prefect has the map functionality, which seems like a good way to do this. If I were not using prefect, I'd do something like this:
Copy code
s3_client = boto3.client('s3')

with requests.request("GET", url, stream=True) as r:
     r.raise_for_status()

     for chunk in r:
        s3_client.put_object(Body=chunk,
                             Bucket='xxx',
                             Key='xxx')
s
If you want to fully parallelize, you'll need to fetch the size beforehand, split it into chunks and run a map on the chunk parts (create a separate stream
url<>S3
for each chunk). The
requests.request
is a single streaming request. That single stream can't be turned into pieces without limiting yourself to I/O throughput on the initial worker (AFAIK). If you want to keep it simpler yet still speed it up, you should be able to use
requests-futures
to make it async and run on multiple OS threads (which, in turn, would not guarantee chunk ordering so make sure you reconstruct it properly once the whole object is streamed).
m
ok awesome. what would be the best way to reconstruct it?
s
Manually, step by step, by first modeling your solution before coding it and making sure it fulfills your needs.