https://prefect.io logo
e

Eric

03/19/2020, 12:43 PM
Does prefect or the prefect community use a common solution for job tracking? The use case is: 1. get S3 file listing 2. use <some job tracking solution> to ignore files that have been previously processed successfully, process new files, and retry failed files with < 3 previous attempts 3. kick off subsequent tasks
s

Scott Zelenka

03/19/2020, 12:47 PM
The cache would work for awhile, but could eventually invalidate is the S3 list just keeps growing. Another approach would be to track the completed processed files in a different bucket, and compare the set to see which ones still need to be processed.
e

Eric

03/19/2020, 12:48 PM
reasonable approaches. I think I'd prefer something database backed, but given the low volume json files in s3 might work
d

Dylan

03/19/2020, 1:42 PM
I tend to track a lot of information like this in firebase or some other easy object store