I am new to Prefect and working my way through the docs. I noticed something weird in the
assignment in the file based storage discussion:
flow.storage = GitHub(
repo="org/repo", # name of repopath="flows/my_flow.py", # location of flow
What I don't understand is, why the need to reference the path the the file from which this flow is defined? I expect to have 100s of flows, each with a
file and so on, and every one has to include its own file path location. Makes it very hard to reorganize flow file folders, and seems like a redundancy that will collect errors.
1 year ago
Hi! We don’t make any assumptions about the user using source control and even if we did it’s generally pretty hacky to infer the path of the file from a call like this (we’d need to go up stack frames to inspect where
was instantiated then figure out where the root of the git repo is then construct a relative path — all of which is quite brittle). Since we’re not storing any of your code, we need to know where the flow lives and this information is just put into the flow metadata so we know where to pull the flow from when you want to run it.
I agree the GitHub storage interface is a little unintuitive at the moment — we’re looking into improving it and clarifying the general flow storage pattern.
I did just realize though that if you want to reduce the repetition here you should easily be able to write a utility
that you pass
and calculate the relative path since you know what the base path of your git repo is.
I see -- I just wanted an explanation why actually, these kind of problems often come from inherent difficulties, thanks! This stood out to me because so much of Prefect is concise and without boilerplate code.
Your utility function idea is the right way, its an easy bit of boilerplate that maybe could be assumed into the storage object definition someday.