Hi friends, we’re currently switching over from Do...
# ask-community
a
Hi friends, we’re currently switching over from Docker storage to GCS storage. I noticed that every time we call
register
with an
idempotency_key
set, the flow is uploaded to storage despite it not having changed. Is this intended? Command used in the comments
Copy code
flow.storage = GCS(bucket="abc")
flow.run_config = KubernetesRun(...)
flow.register(
  project_name="abc",
  idempotency_key=flow.serialized_hash(),
)
z
Hi @Adam, the idempotency key is used for registration with the Prefect backend, your storage will still be built by default.
This is actually a desired behavior for some people, because the
serialized_hash
is a hash for the serialized flow which is just metadata. The code of one of your tasks could change slightly and the hash would not change. In this case, your storage would be updated but your flow version in Cloud would not be updated.
a
Thanks @Zanie - is there any way to only build storage it if the flow has changed? Our CI process loops through all our flows and calls the above code. Currently have about 25 flows and it’s growing quite fast so want to avoid having to reupload every flow every time
z
I agree that it'd be useful if we could reduce expensive storage rebuilds using a hash/key check. I'll check with the rest of the team to see if there's a better pattern.
a
Thanks!
z
I know a lot of people will use
git
to check if there are changes to any of their flow files
a
Indeed. But hoping to avoid that as things get a bit complicated when everything is already committed to master. Have to start comparing commits etc
z
While I wait for someone to get back to me--we don't really think docker/gcs storage is a great pattern because of the build time here. Generally we'd recommend using a
DockerRun
(or in your case
KubernetesRun
) with a base image that has your shared code and storing your flows using a lighter storage (ie
S3
)
a
Thanks @Zanie - that’s exactly what we’re doing though. A base Docker image with all the deps and shared code + KubernetesRun + GCS for flows (equivalent to S3)
Ah wait, I see the confusion. By GCS I meant Google Cloud Storage (i.e. S3 on google) rather than Google Container Registry 😛
z
Oh I'm sorry! I forget the GCloud acronyms sometimes 🤦‍♂️
Is the upload slow enough to be concerning?
a
It’s okay for now, about 3 seconds per flow. I think I’ll write some code to detect what changed - will need that anyway to conditionally trigger Docker rebuilds