https://prefect.io logo
v

Varuna Bamunusinghe

01/20/2022, 6:06 AM
Is there a way to download a file from S3 and save to a location, and skip re-downloading the file if it's already saved.
k

Kevin Kho

01/20/2022, 6:21 AM
Hi @Varuna Bamunusinghe, I think this should be handles in your task logic. How are you downloading the files now?
v

Varuna Bamunusinghe

01/20/2022, 6:31 AM
I am using aws cli. But, I can download it using boto3. But, I can't find a way to skip the step if the file is downloaded already. I can manually check for os.exists, but I prefer to use task decorator checkpoint if possible.
I just checked the Task classes. I would be able to write a Task class for this.
k

Kevin Kho

01/20/2022, 6:50 AM
I don’t think the checkpoint is intended for this use case. It’s to persist a task result so that it can be loaded in when you need to restart a flow run from the point of failure. This type of checking the file with
os.exists
would fit better in the task logic.
1
v

Varuna Bamunusinghe

01/20/2022, 7:07 AM
I just implemented an S3Downloader with
os.exists
. Thanks for the help.
👍 1
5 Views