Adam

    Adam

    1 year ago
    Hi @Jim Crist-Harif and team, read some of your earlier replies, but wanted to ask whether by any chance you have had time to publish a best practice blog post or repo that shows an opinionated way of having a project that: • Uses a base docker image that contains prefect, all the custom deps and shared code • Stores the flows on Github / s3 / etc • Can organise individual flows into groups/folders locally to keep things a bit clean (and play well with pythonpath etc) and ideally have some convention in individual flow files to indicate which project they should be registered with. • Some type of CI/CD process that does the build of the image (when needed) and stores the flows • Uses Kubernetes (with or without Dask)@Chris White @Jeremiah I’ve been using Prefect for about 6 months now and I have many of the above aspects done but it’s all becoming quite a big mess. We’re looking for an opinionated / best practice way to do it now. Based on a lot of the questions above, I think a cookie cutter with sane defaults would really benefit the community. WDYT? P.S. Would be more than happy to show you the mess we have and get your advice on how things can be improved
    Josh Greenhalgh

    Josh Greenhalgh

    1 year ago
    I would love to know what mess you have so I can avoid!
    I completely agree on the need for some kind of opinionated CI/CD process though - there was a discussion started where people were asked how they currently handle these issues so would be great if perhaps you added to that? https://github.com/PrefectHQ/prefect/discussions/4042
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    We actually just had an internal meeting about what this would look like yesterday, so hopefully we'll have something in the coming weeks. Lots to do.
    I have many of the above aspects done but it’s all becoming quite a big mess.
    I'd be interested in hearing more about what issues you've run into. Could you comment more on this?
    a

    Alex Papanicolaou

    1 year ago
    I wrote up a little bit about what we do in another thread: https://prefect-community.slack.com/archives/CL09KU1K7/p1612958302086500
    ale

    ale

    1 year ago
    At Cloud Academy we came up with a solution to address CI/CD integration and flows organization into folders, execution on ECS with ECS Agent and Docker storage. It’s not 100% clean, but it works pretty well. If anyone is interested, feel free to DM me 🙂
    Billy McMonagle

    Billy McMonagle

    1 year ago
    I love to see this as an active topic of discussion, and would be glad to see the community start to align around best practices. I posted in the github discussion linked above, but this thread is making me rethink our use of docker storage. I'd like to know the advantages of choosing docker vs github/s3 storage... it sounds like it's possible that github/s3/etc might result in much faster build times, since images would not need to rebuilt on every code change.
    Michael Adkins

    Michael Adkins

    1 year ago
    @ale if you're willing to add it to that github discussion we'd appreciate it a lot! @Adam same goes for you. Users (like you) that are pushing the boundaries of CI/CD with prefect have valuable perspectives that are hard to capture here on slack 🙂
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    this thread is making me rethink our use of docker storage. I'd like to know the advantages of choosing docker vs github/s3 storage... it sounds like it's possible that github/s3/etc might result in much faster build times, since images would not need to rebuilt on every code change.
    We definitely want to write more docs on this, but I just pushed a few updates to our storage docs that you might find useful: https://docs.prefect.io/orchestration/flow_config/storage.html. See in particular: https://docs.prefect.io/orchestration/flow_config/storage.html#choosing-a-storage-class
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Thanks @Jim Crist-Harif, that is interesting and helpful. I'm wondering under what circumstances you would recommend docker storage as the best choice?
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    I honestly wouldn't ever recommend it. It might be useful for some use cases where you want the only artifact representing a flow to be the docker image, but even in that case I think
    Module
    storage + a custom docker image would be cleaner.
    Just my opinion though. There's lots of ways to use prefect.
    Billy McMonagle

    Billy McMonagle

    1 year ago
    Wow! That's extremely helpful to know.
    Michael Adkins

    Michael Adkins

    1 year ago
    For a while I thought it was the only good option 😄 but now that there are run configs I agree that a
    DockerRun
    feels cleaner. I think
    DockerStorage
    is a good way for a beginner to create an environment for their flow, especially since you can specify your python requirements and such using documented kwargs. It starts to feel heavy at the project-level though.
    a

    Alex Papanicolaou

    1 year ago
    @Michael Adkins don’t forget about me j/k. We’ll add our setup to that GH discussion, which runs everything via dask-kubernetes on EKS.
    Michael Adkins

    Michael Adkins

    1 year ago
    You too @Alex Papanicolaou! 😄
    I actually wrote a
    FlowBuilder
    at one point while exploring this as well
    Michael Hadorn

    Michael Hadorn

    1 year ago
    Could you also explain a little bit, why we actually need the registration, if we use a "stored_as_file"-storage. I would like an option, that will rebuild the flow anytime and run it. independent of the structure of the flow.
    Adam

    Adam

    1 year ago
    Thanks for all the cool suggestions. I’ve written up a long addition to the discussion on https://github.com/PrefectHQ/prefect/discussions/4042. Happy to chat further!
    Michael Adkins

    Michael Adkins

    1 year ago
    @Michael Hadorn -- we're thinking about making registration more intuitive but I don't think I can say much else right now 🙂
    Thanks Adam! I just read through it, lots of good details
    m

    Matthew Alhonte

    1 year ago
    We've been using Azure Pipelines - still working out kinks, but it works okay!
    I think Azure might be particularly good for this? I'm not really on the Ops side, but our k8s cluster is actually on AWS, so I'm guessing there's some advantage there!
    Wish I understood it enough to go into more detail - I think we use some kind of template? And each flow has its own folder in the repo, with a spec file that extends the template.
    Adam

    Adam

    1 year ago
    We’ve recently improved our implementation. I’ve added another reply to my earlier post on the thread for those that are interested
    Michael Adkins

    Michael Adkins

    1 year ago
    Thanks for contributing Adam!