Hi <@U011EKN35PT> and team, read some of your earl...
# prefect-community
a
Hi @Jim Crist-Harif and team, read some of your earlier replies, but wanted to ask whether by any chance you have had time to publish a best practice blog post or repo that shows an opinionated way of having a project that: • Uses a base docker image that contains prefect, all the custom deps and shared code • Stores the flows on Github / s3 / etc • Can organise individual flows into groups/folders locally to keep things a bit clean (and play well with pythonpath etc) and ideally have some convention in individual flow files to indicate which project they should be registered with. • Some type of CI/CD process that does the build of the image (when needed) and stores the flows • Uses Kubernetes (with or without Dask) @Chris White @Jeremiah I’ve been using Prefect for about 6 months now and I have many of the above aspects done but it’s all becoming quite a big mess. We’re looking for an opinionated / best practice way to do it now. Based on a lot of the questions above, I think a cookie cutter with sane defaults would really benefit the community. WDYT? P.S. Would be more than happy to show you the mess we have and get your advice on how things can be improved
2
🎯 3
j
I would love to know what mess you have so I can avoid!
I completely agree on the need for some kind of opinionated CI/CD process though - there was a discussion started where people were asked how they currently handle these issues so would be great if perhaps you added to that? https://github.com/PrefectHQ/prefect/discussions/4042
j
We actually just had an internal meeting about what this would look like yesterday, so hopefully we'll have something in the coming weeks. Lots to do.
I have many of the above aspects done but it’s all becoming quite a big mess.
I'd be interested in hearing more about what issues you've run into. Could you comment more on this?
a
I wrote up a little bit about what we do in another thread: https://prefect-community.slack.com/archives/CL09KU1K7/p1612958302086500
a
At Cloud Academy we came up with a solution to address CI/CD integration and flows organization into folders, execution on ECS with ECS Agent and Docker storage. It’s not 100% clean, but it works pretty well. If anyone is interested, feel free to DM me 🙂
b
I love to see this as an active topic of discussion, and would be glad to see the community start to align around best practices. I posted in the github discussion linked above, but this thread is making me rethink our use of docker storage. I'd like to know the advantages of choosing docker vs github/s3 storage... it sounds like it's possible that github/s3/etc might result in much faster build times, since images would not need to rebuilt on every code change.
z
@ale if you're willing to add it to that github discussion we'd appreciate it a lot! @Adam same goes for you. Users (like you) that are pushing the boundaries of CI/CD with prefect have valuable perspectives that are hard to capture here on slack 🙂
upvote 1
j
this thread is making me rethink our use of docker storage. I'd like to know the advantages of choosing docker vs github/s3 storage... it sounds like it's possible that github/s3/etc might result in much faster build times, since images would not need to rebuilt on every code change.
We definitely want to write more docs on this, but I just pushed a few updates to our storage docs that you might find useful: https://docs.prefect.io/orchestration/flow_config/storage.html. See in particular: https://docs.prefect.io/orchestration/flow_config/storage.html#choosing-a-storage-class
b
Thanks @Jim Crist-Harif, that is interesting and helpful. I'm wondering under what circumstances you would recommend docker storage as the best choice?
j
I honestly wouldn't ever recommend it. It might be useful for some use cases where you want the only artifact representing a flow to be the docker image, but even in that case I think
Module
storage + a custom docker image would be cleaner.
Just my opinion though. There's lots of ways to use prefect.
b
Wow! That's extremely helpful to know.
z
For a while I thought it was the only good option 😄 but now that there are run configs I agree that a
DockerRun
feels cleaner. I think
DockerStorage
is a good way for a beginner to create an environment for their flow, especially since you can specify your python requirements and such using documented kwargs. It starts to feel heavy at the project-level though.
a
@Zanie don’t forget about me j/k. We’ll add our setup to that GH discussion, which runs everything via dask-kubernetes on EKS.
z
You too @Alex Papanicolaou! 😄
I actually wrote a
FlowBuilder
at one point while exploring this as well
m
Could you also explain a little bit, why we actually need the registration, if we use a "stored_as_file"-storage. I would like an option, that will rebuild the flow anytime and run it. independent of the structure of the flow.
a
Thanks for all the cool suggestions. I’ve written up a long addition to the discussion on https://github.com/PrefectHQ/prefect/discussions/4042. Happy to chat further!
🚀 1
z
@Michael Hadorn -- we're thinking about making registration more intuitive but I don't think I can say much else right now 🙂
🙌 2
Thanks Adam! I just read through it, lots of good details
m
We've been using Azure Pipelines - still working out kinks, but it works okay!
I think Azure might be particularly good for this? I'm not really on the Ops side, but our k8s cluster is actually on AWS, so I'm guessing there's some advantage there!
Wish I understood it enough to go into more detail - I think we use some kind of template? And each flow has its own folder in the repo, with a spec file that extends the template.
a
We’ve recently improved our implementation. I’ve added another reply to my earlier post on the thread for those that are interested
z
Thanks for contributing Adam!