<@ULVA73B9P> I use the S3 storage block to store m...
# ask-marvin
d
@Marvin I use the S3 storage block to store my flow code. However, when I create new deployments, deleted files are not removed from S3 which causes bugs in my flow code. Is there anyway to force Prefect to delete files from S3 which aren't included in the Deployment code?
m
It appears that the Prefect documentation does not contain explicit instructions about deleting files from S3 that are no longer part of the deployment code. However, managing deployment artifacts to keep them in sync with the codebase could be part of a development CI/CD process. One method to handle this is by implementing a custom script that cleans up the S3 bucket and runs as part of your deployment process, ensuring that only the current files are retained in S3. If necessary, you might consider raising an issue or requesting this feature on the Prefect GitHub repository: Prefect GitHub Issues. As this seems like a common issue one might face when deploying flows with Prefect, I recommend reaching out to the Prefect community or support to see if there are any established patterns or recommendations for this scenario. Please refer to these links for additional context about storage and deployment in Prefect: - Prefect Storage Blocks - Prefect Block-Based Deployments
n
we generally wont (i dont think ever) automatically delete things from your infrastructure
causes bugs in my flow code.
what bugs are caused by deleting deployments?
d
@Nate Similar issue was experienced here -- This may be specific to DBT, but it is designed to run all the SQL models in a given directory. If we delete a model from this directory in our repository, but that deletion doesn't happen in S3, DBT ends up running that model inside of our Prefect flows because it was never deleted from S3. We could write some sort of custom CI/CD script to remove files older than the latest deployment's timestamp, but this seems like something Prefect should be able to handle. Let me know if I can clarify
n
hmm thats interesting (disclaimer im not a dbt buff) at the time when you write to your disk, can you look up the deployment that needs to use those files / paths? im thinking that if you wrote to a path that contains the name + version of the deployment that will need that path, that you could have the flow use
prefect.runtime.deployment
to only try to run the models that are in the place on disk that corresponds to that deployment then yeah, if you needed to clean up stuff on disk associated with old deployments, you could have some script in CI that deletes paths associated with deployments you can't find in the API anymore does that make sense?
d
Hey Nate -- Yeah, sort of. I think I came up with a similar, but slightly different approach. I was planning to dynamically set the S3 bucket_path to include the current date when we define the
storage
object for a given deployment. It would mean that every time we deploy new code we would essentially create a new S3 directory for it. Then we could just set some lifecycle rules on our S3 bucket to get rid of old code after a couple months or something. I do still feel like in
prefect.filesystems.s3
it would be nice to have an optional attribute to overwrite the directory when you instantiate an S3 Storage object, but the above solution is certainly acceptable to get things working. https://docs.prefect.io/latest/api-ref/prefect/filesystems/#prefect.filesystems.S3 Re: DBT --> This is just my personal example. I guess I could foresee this being an issue not just with DBT, but really with any script that runs every file inside a given directory.
n
that makes sense to me. if you have the bandwidth, an issue codifying your ask here would be helpful in making sure this problem is articulated / on our radar officially!
d
where can I file an issue?
d
Thanks!
n
👍