Pedro Machado

08/09/2022, 2:01 PM
Hi everyone. I need to copy a set of large files from s3 to AZ storage (blob). It looks like both the s3 and the Azure Blob tasks in the library read the data in memory. I tried rewriting them to stream the data instead. I got it to work but the machine gets unresponsive when transferring large files. This is running in a container on an AWS linux instance (DockerRun). Any suggestions on the best way to stream a file this way without reading it into memory? Thanks!

Rob Freedy

08/09/2022, 9:00 PM
For this use case, it may be worth looking into something like azcopy in Azure. I do not believe that the task libraries in Prefect have a way to stream file contents without copying in memory.

Pedro Machado

08/10/2022, 2:18 AM
Thanks, Rob. I ended up using
to read and a custom Task similar to the task in the prefect library that accepts a file-like object as the
arg. I plan to try the respective CLIs as they seem faster and more robust.
👍 1