https://prefect.io logo
Title
p

Pedro Machado

08/09/2022, 2:01 PM
Hi everyone. I need to copy a set of large files from s3 to AZ storage (blob). It looks like both the s3 and the Azure Blob tasks in the library read the data in memory. I tried rewriting them to stream the data instead. I got it to work but the machine gets unresponsive when transferring large files. This is running in a container on an AWS linux instance (DockerRun). Any suggestions on the best way to stream a file this way without reading it into memory? Thanks!
r

Rob Freedy

08/09/2022, 9:00 PM
For this use case, it may be worth looking into something like azcopy in Azure. I do not believe that the task libraries in Prefect have a way to stream file contents without copying in memory. https://docs-v1.prefect.io/api/latest/tasks/aws.html#aws-tasks https://docs-v1.prefect.io/api/latest/tasks/azure.html#azure-tasks https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json
p

Pedro Machado

08/10/2022, 2:18 AM
Thanks, Rob. I ended up using
s3fs
to read and a custom Task similar to the task in the prefect library that accepts a file-like object as the
data
arg. I plan to try the respective CLIs as they seem faster and more robust.
👍 1