Wolfgang Kerzendorf
03/05/2020, 4:03 PMnicholas
03/05/2020, 4:18 PMglob
-like server side filtering. However boto3 allows you to pass the Prefix
argument when inspecting a bucket, which, depending on the structure of your bucket and the nature of the files you're looking for, may serve your use case. Depending on the number of files you're transforming, you may want to do some batch processing here to avoid bottlenecks and to allow your download/uploads to share a client. You can map these batches as necessary and use the Prefect Logger to raise any issues that come up when doing the processing.
The Prefect Docs have a bare-bones example of an ETL flow here: https://docs.prefect.io/core/examples/etl.html
For an example of using the boto3 client to upload files, boto3 has an example here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
I also found a python library that might help with generating prefixes to pass to the boto3 client, if standard strings won't work for your use case: https://github.com/asciimoo/exrex
Last, if you want to read more about Prefect loggers, you can do so here: https://docs.prefect.io/core/concepts/logging.html#logging-configurationWolfgang Kerzendorf
03/05/2020, 4:19 PMnicholas
03/05/2020, 4:22 PMglob
module is fine!Wolfgang Kerzendorf
03/05/2020, 4:23 PMnicholas
03/05/2020, 4:28 PMWolfgang Kerzendorf
03/05/2020, 4:30 PMextract
gives a list and then transform
works on this list - rather than having 3 "transformers"nicholas
03/05/2020, 4:36 PMWolfgang Kerzendorf
03/05/2020, 4:52 PMnicholas
03/05/2020, 4:53 PMJeremiah
03/05/2020, 6:06 PMWolfgang Kerzendorf
03/05/2020, 9:30 PMJeremiah
03/05/2020, 10:01 PMWolfgang Kerzendorf
03/05/2020, 10:02 PMJeremiah
03/05/2020, 10:02 PMWolfgang Kerzendorf
03/05/2020, 10:02 PMJeremiah
03/05/2020, 10:04 PMifelse
conditional (for a more formal version) or, more simply, bake the conditional directly into a mapped task which checks if that branch should proceed and raises a SKIP
signal otherwise (you can learn more about signaling here).SKIP
to differentiate intentional skips from true failuresWolfgang Kerzendorf
03/05/2020, 10:05 PMJeremiah
03/05/2020, 10:05 PMWolfgang Kerzendorf
03/06/2020, 4:20 PM