https://prefect.io logo
p

Pedro Machado

09/17/2020, 4:18 PM
Hi there. I am looking for suggestions on the best way to structure a repo to store Prefect flows. We expect to have several flows that use some shared functionality (example: code to run queries and get data from Presto or to interact with an API). Could you suggest a folder structure that allows us to create flows and import shared code? Also, I anticipate having several sql files that would be run by the different flows. This is one of those "I don't know what I don't know yet" questions. Just trying to set things up right from the beginning. Although we won't do CI/CD from the start, I'd like to have a set up that will allow us to implement CI/CD. in the near future This client uses GitLab. A little more background ... • I am planning to use Prefect cloud. • We'll probably start with a docker agent running on an AWS instance but I am open to suggestions. The workflows will be primarily pulling data, running queries, sending files, etc. (we won't be training ML models, etc.) • The group I am working with has limited devops support and any additional infrastructure takes a while to request, get approved, and provisioned. • Most of the code will be written in Python but they have some legacy R stuff that we have been running inside of a container Thansk!
d

Dylan

09/17/2020, 5:14 PM
Hi @Pedro Machado! Are you planning on running your flows with Docker / inside a containerized setup?
p

Pedro Machado

09/17/2020, 5:17 PM
That's what I was thinking so I would not have to deal with dependency issues. However, I would need a private container registry which I hope my client can request from their devops team.
Let me know if you have a different suggestion.
d

Dylan

09/17/2020, 5:22 PM
In that case, if you put shared code into a specific folder you can include it into your containers by making it a package and installing that package in your containers as part of your build process
This would allow you to take advantage of a monorepo structure for all of your flows (if you’d like) and would keep a clean code-sharing pattern
p

Pedro Machado

09/17/2020, 5:40 PM
I have been using
myflow.register()
to build and publish flow images (during my evaluation, not in this specific setup). How would you change this process to implement your suggestion?
Any code samples you can share?
d

Dylan

09/17/2020, 5:52 PM
This suggestion would involve a custom dockerfile that installed your python package
I don’t have an example handy but here’s some info about python packages
p

Pedro Machado

09/17/2020, 8:25 PM
Thanks, Dylan! I'll take a look.