merlin

    merlin

    1 year ago
    I am new to Prefect and working my way through the docs. I noticed something weird in the
    flow.storage
    assignment in the file based storage discussion:
    flow.storage = GitHub(
        repo="org/repo",                 # name of repo
        path="flows/my_flow.py",        # location of flow
    What I don't understand is, why the need to reference the path the the file from which this flow is defined? I expect to have 100s of flows, each with a
    flows/my_flow001.py
    file and so on, and every one has to include its own file path location. Makes it very hard to reorganize flow file folders, and seems like a redundancy that will collect errors.
    Michael Adkins

    Michael Adkins

    1 year ago
    Hi! We don’t make any assumptions about the user using source control and even if we did it’s generally pretty hacky to infer the path of the file from a call like this (we’d need to go up stack frames to inspect where
    GitHub
    was instantiated then figure out where the root of the git repo is then construct a relative path — all of which is quite brittle). Since we’re not storing any of your code, we need to know where the flow lives and this information is just put into the flow metadata so we know where to pull the flow from when you want to run it.
    I agree the GitHub storage interface is a little unintuitive at the moment — we’re looking into improving it and clarifying the general flow storage pattern.
    I did just realize though that if you want to reduce the repetition here you should easily be able to write a utility
    get_flow_path_in_repo
    that you pass
    __file__
    and calculate the relative path since you know what the base path of your git repo is.
    merlin

    merlin

    1 year ago
    I see -- I just wanted an explanation why actually, these kind of problems often come from inherent difficulties, thanks! This stood out to me because so much of Prefect is concise and without boilerplate code. Your utility function idea is the right way, its an easy bit of boilerplate that maybe could be assumed into the storage object definition someday.
    Michael Adkins

    Michael Adkins

    1 year ago
    https://github.com/PrefectHQ/prefect/pull/3988 — lots of edge cases to address but it’s not unreasonable