thread for dependency questions from above Ideally we re lo Prefect Community #ask-community

:thread: for dependency questions from above: Ide...

David Evans

04/28/2022, 11:58 AM

🧵 for dependency questions from above: Ideally we're looking to keep our flows minimal, so virtual environments would be preferable over full docker images, but that's not a strict requirement. If docker images will get us what we need, we can work with that. Presumably we'd have to create a docker image (based on

prefecthq:prefect

? or would it just need

python:3

?) for each flow? And I guess these docker images would run

pip install -r requirements.txt

as a build layer. But if we can achieve this with a virtual environment instead I think that would be preferable (I'm thinking in terms of the flow needed for an engineer to try something out by pushing it to the runner from their local machine) (I can see the high-level concept here but I'm struggling to see how it will look in practice for the various use-cases)

Anna Geller

04/28/2022, 11:59 AM

I answered some of that in the previous thread but I believe watching

this▾

would answer many of your questions - Laura can explain that better than I can

Anna Geller

04/28/2022, 12:00 PM

we always prefer Prefect base images rather than python-based images, really good question!

Anna Geller

04/28/2022, 12:01 PM

the choice between conda and docker image depends to some extent on your choice of infrastructure - if you go with EKS, you need Docker images rather than conda

Anna Geller

04/28/2022, 12:02 PM

but again, you can totally use a single image to deploy both - SubprocessFlowRunner with conda, and DockerFlowRunner with a docker image

David Evans

04/28/2022, 12:02 PM

ok; I'll take some time to go over everything you've said and the resources you linked to, and will get back to you if I have more questions. Thanks!

Anna Geller

04/28/2022, 12:09 PM

Awesome, keep us posted, I'd be curious to hear what decisions you made in the end

David Evans

04/28/2022, 1:24 PM

OK, so as I understand everything there, we'd have a long-lived, rarely-updating agent per-environment (dev/production) each reading a dedicated work queue, then follow this process in our pipeline: 1. Create a docker image with the latest dependencies 2. Push this docker image to AWS Elastic Container Registry 3. Connect to Prefect Cloud and create a deployment for each flow ◦ this would need the base image (and possibly tag) to be set dynamically, so I guess we'd make the python script read environment variables or something which are passed in from the pipeline ◦ I guess we can't do something like

--path='**/*'

any more, so this will have to discover all our flows by some hand-cranked way? Overall that looks like it would fix the issue of long-running tasks being interrupted, as well as the issue of multiple engineers testing changes at the same time, but also sounds like it will be super slow! — perhaps longer than the replace-the-agent approach — and will need us to build our own tooling to abstract it all away for when somebody wants to just test something from their local machine). Am I still missing something here that can simplify things?

David Evans

04/28/2022, 1:24 PM

(thanks for all the help so far btw; things are getting much clearer in my head now)

Anna Geller

04/28/2022, 1:42 PM

Sure, happy to help! I'm not sure I understood the argument that it would be super slow - do you mean your CI building a Docker image would be slow? it also depends on how frequently your dependencies change btw we are currently working on making packaging code dependencies easier - would you be open to revisiting this topic together in a couple of weeks? I think, for now, you could address the dependency packaging to a large extent with a CI/CD pipeline and Docker and many images can potentially be reused across flows that need the same dependencies Also, for custom code dependencies (sort of utility functions), you could build your custom Python package and install it in your container/conda env. Here is one example

David Evans

04/28/2022, 3:11 PM

do you mean your CI building a Docker image would be slow?

yes, as well as the need for it to load the new (presumably not cached) docker image when running each flow we had a chat internally and we think the custom python packages is a good option - we'll explore using AWS CodeArtifact for that As for revisiting this later, I think we can, although I personally may well have moved on to an unrelated project by then, but there will still be somebody here interested in this stuff.

Anna Geller

04/28/2022, 4:12 PM

we had a chat internally and we think the custom python packages is a good option - we'll explore using AWS CodeArtifact for that

Wow, that's so cool! I agree that this is a great idea. I would really appreciate it if you could share more about your progress on that. Not necessarily a blog post if you don't want to, but even if you could share some Github Gist or code repository showing how you approached it, that would be fantastic!

As for revisiting this later, I think we can, although I personally may well have moved on to an unrelated project by then, but there will still be somebody here interested in this stuff.

That's cool to hear. You can subscribe 🔔 to this Discourse tag to get an email notification about any new release (a.o. about that topic)

8 Views

Open in Slack

Previous Next