:thread: for dependency questions from above: Ide...
# prefect-community
d
๐Ÿงต for dependency questions from above: Ideally we're looking to keep our flows minimal, so virtual environments would be preferable over full docker images, but that's not a strict requirement. If docker images will get us what we need, we can work with that. Presumably we'd have to create a docker image (based on
prefecthq:prefect
? or would it just need
python:3
?) for each flow? And I guess these docker images would run
pip install -r requirements.txt
as a build layer. But if we can achieve this with a virtual environment instead I think that would be preferable (I'm thinking in terms of the flow needed for an engineer to try something out by pushing it to the runner from their local machine) (I can see the high-level concept here but I'm struggling to see how it will look in practice for the various use-cases)
a
I answered some of that in the previous thread but I believe watching

thisโ–พ

would answer many of your questions - Laura can explain that better than I can
we always prefer Prefect base images rather than python-based images, really good question!
the choice between conda and docker image depends to some extent on your choice of infrastructure - if you go with EKS, you need Docker images rather than conda
but again, you can totally use a single image to deploy both - SubprocessFlowRunner with conda, and DockerFlowRunner with a docker image
d
ok; I'll take some time to go over everything you've said and the resources you linked to, and will get back to you if I have more questions. Thanks!
a
Awesome, keep us posted, I'd be curious to hear what decisions you made in the end
d
OK, so as I understand everything there, we'd have a long-lived, rarely-updating agent per-environment (dev/production) each reading a dedicated work queue, then follow this process in our pipeline: 1. Create a docker image with the latest dependencies 2. Push this docker image to AWS Elastic Container Registry 3. Connect to Prefect Cloud and create a deployment for each flow โ—ฆ this would need the base image (and possibly tag) to be set dynamically, so I guess we'd make the python script read environment variables or something which are passed in from the pipeline โ—ฆ I guess we can't do something like
--path='**/*'
any more, so this will have to discover all our flows by some hand-cranked way? Overall that looks like it would fix the issue of long-running tasks being interrupted, as well as the issue of multiple engineers testing changes at the same time, but also sounds like it will be super slow! โ€” perhaps longer than the replace-the-agent approach โ€” and will need us to build our own tooling to abstract it all away for when somebody wants to just test something from their local machine). Am I still missing something here that can simplify things?
(thanks for all the help so far btw; things are getting much clearer in my head now)
a
Sure, happy to help! I'm not sure I understood the argument that it would be super slow - do you mean your CI building a Docker image would be slow? it also depends on how frequently your dependencies change btw we are currently working on making packaging code dependencies easier - would you be open to revisiting this topic together in a couple of weeks? I think, for now, you could address the dependency packaging to a large extent with a CI/CD pipeline and Docker and many images can potentially be reused across flows that need the same dependencies Also, for custom code dependencies (sort of utility functions), you could build your custom Python package and install it in your container/conda env. Here is one example
d
do you mean your CI building a Docker image would be slow?
yes, as well as the need for it to load the new (presumably not cached) docker image when running each flow we had a chat internally and we think the custom python packages is a good option - we'll explore using AWS CodeArtifact for that As for revisiting this later, I think we can, although I personally may well have moved on to an unrelated project by then, but there will still be somebody here interested in this stuff.
a
we had a chat internally and we think the custom python packages is a good option - we'll explore using AWS CodeArtifact for that
Wow, that's so cool! I agree that this is a great idea. I would really appreciate it if you could share more about your progress on that. Not necessarily a blog post if you don't want to, but even if you could share some Github Gist or code repository showing how you approached it, that would be fantastic!
As for revisiting this later, I think we can, although I personally may well have moved on to an unrelated project by then, but there will still be somebody here interested in this stuff.
That's cool to hear. You can subscribe ๐Ÿ”” to this Discourse tag to get an email notification about any new release (a.o. about that topic)