https://prefect.io logo
Title
w

Walter Cavinaw

04/29/2022, 9:34 PM
hi. I recently came across prefect and it looks amazing. I am trying it out with a few of our jobs, but I'm stuck on something! (Sorry for the simple question!) When registering a flow with Bitbucket storage, how do I ensure that shared functions (from a project utilities module) are available to the flow when it runs on an agent (in this case a local agent)? In the project folder flows are in a flows folder, utilities in another folder, etc. I guess the flow is not being run from within the repo project directory because it can't find the utils and libs modules?
k

Kevin Kho

04/30/2022, 6:37 AM
Hi @Walter Cavinaw, no worries on the question! All the modules need to be installed in the execution environment. Bitbucket Storage only keeps the flow definition (flow file). Only Docker storage keeps all the dependencies together so the recommendation is to install it as a Python module inside a Docker container. You can see this blog post if it helps making that container image.
w

Walter Cavinaw

04/30/2022, 4:27 PM
thanks kevin. Your posts and video demos have been super helpful so far 👍
k

Kevin Kho

04/30/2022, 4:48 PM
Oh thank you for watching 🙂
w

Walter Cavinaw

05/02/2022, 4:16 AM
I read through your writeup and tried it out. That works well for me. thank you for sharing! I have a follow up question. I get why bitbucket storage doesn't work (it doesn't have the project modules installed), but does it still clone the whole repo? E.g you have another file (cofig, yaml,csv etc) in the project and want to read it in the flow. Could you still reference that file?
k

Kevin Kho

05/02/2022, 4:31 AM
You can use git storage with additional files like this but this is meant for csv or yaml or sql. For Python files, it needs to be installed or added to the Python path.
w

Walter Cavinaw

05/02/2022, 4:58 AM
Ok i see, git storage works differently than bitbucket storage (which is using the api to get a single file I presume). Using git storage, I guess if I wanted to do something real hacky, I could add the project directory at the start of each flow file?
sys.path.append(str(Path(__file__).resolve().parent.parent))
Obviously this is not very robust, but I'll try it for this one case...
k

Kevin Kho

05/02/2022, 2:22 PM
I don’t think it’ll work? We’ve had people try but I have yet to see anyone figure it out. The Python path manipulation is pretty hard
w

Walter Cavinaw

05/02/2022, 9:14 PM
yes i see, that makes sense. We don't have cross project dependencies. The local agent is run on an image with all other dependencies (pip/conda) except for project modules. Using GitStorage and adding that line above to each flow file seems to do the trick for us. It adds our utils/libs modules and seems to work. (knock on wood). Just letting you know in case others might find it helpful as a quick and dirty hack. A simple view on our project structure: project -> /utils --> db_helpers.py --> data_helpers.py -> /flows --> model_flow_x.py --> model_flow_y.py
k

Kevin Kho

05/02/2022, 10:02 PM
Oh I see. Nice work!