Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

If the flow registration `idempotency_key` matches the one from the previous flow registration, the flow version is not bumped. However, does the serialised flow get uploaded to storage anyway? I'm too newbie at the codebase (and graphql in particular) to make sense of the details of the flow registration code.

I tested with local storage, and it seems that the answer is that the serialised flow does get uploaded to storage even if the flow version is not bumped.

It seems to me that this is a bug since the docs state:
```If you are registering flows using automation, you may want to pass an idempotency_key which will only create a new version when the key changes.```
but if the serialised flow is saved in storage anyway, then automation that keeps re-registering the flow will eventually fill up storage with junk. After all, the flow version is just a number, whereas excess flow storage consumes space that may need to be paid for.

Yeah I get you here — there’s a bit of a dichotomy here between registration and storage. The idempotency key, as you note, only applies to registration--which is when the backend is told where the flow is stored. The backend is quite ignorant about what’s going on with storage otherwise. Part of the problem is that the serialized hash only knows about what would be sent to the backend--it does not inspect your task code, etc. so basically the only way it changes on task code changes is if there’s a new storage path.

This is something we’re working on improving in the future, but I think your best bet for now is calling a `git` command from python to see if the flow should be registered. (also note, you can control building of the flow’s storage during registration with the `build: bool`  kwarg)

Ah, what I've done is replace `serialized_hash()` with a combined hash of the dependency management lockfile, the files containing code common to the flows, the files containing that particular flow (or rather project), and the cron string for the flow as supplied by an environment variable. So this way the flow only gets bumped in version if there's a dependency, code, or schedule change. Is this potentially problematic?

I understand that a flow doesn't need to be re-registered if its task code changes, as long as the task DAG remains the same. Is this correct?

Does a flow need to be re-registered if a schedule specified via code (`CronSchedule`) changes? It seems that way to me because such a schedule is listed as read-only by the web UI, hence re-registering seems the only way to update that kind of schedule.

I don’t think that’s problematic.

That is correct, as long as the storage is updated with the new task code.

Yeah I believe re-registration is necessary there, although there will likely be improvements to scheduling this year.

Ah, then that suggests a solution to my storage problem for now: instead of using a time-based storage key like what Prefect does by default, I can use a hash-based storage key. So the re-registration will upload the storage, but it'll replace the existing serialized flow so no additional storage is used up.