Ben Ayers-Glassey
05/05/2022, 5:27 AMextract_flow_from_file
, extract_flow_from_module
, and flow_from_bytes_pickle
.
They all live in `prefect.utilities.storage`: https://github.com/PrefectHQ/prefect/blob/6d6c7bff2b9fc8db7d44ad7f9596fcd67b5b1691/src/prefect/utilities/storage.py
Now, loading .py files with eval()
and loading Flow objects using cloudpickle are kind of cool, but it makes it difficult to write traditional Python code.
I'm trying to write a bunch of flows, living in separate .py files within the same Git repo, and I find myself leaving big comments all over the place warning not to treat these .py files as modules, because when they get loaded by the Docker storage class, it'll be using cloudpickle, etc etc. Except that when writing unit tests, you of course do treat these files as modules (to import functions out of them to test). So that's already difficult to explain to other devs.
And in order for these flows to share code, I need to make an entire pip-installable package; and since our code is private, I need to host that package in a private PyPi server; and then I need to convince the Docker class to load requirements from that server; etc etc.
So it seems like it would be much nicer to use extract_flow_from_module
, but looking at the source code, only 2 storage classes use it: Local
and Module
.
So it seems like it would be nice if more storage classes were able to load from modules, maybe with a load_from_module=True
kwarg or something.
For instance, the GitHub
storage class originally seemed like the most straightforward; but it can only load flows by calling eval()
on single .py files from the Git repo.
It would be really nice if it could just clone the whole repo and then behave like the Module
storage class.
Does that make sense?
At the moment, we're considering moving almost all our code out of the individual flow .py files and into the pip-installable package; so then everything becomes "regular old Python modules", and we would just need one small component which knows how to import Flow objects from that module and register them.
Does this all sound like something people have already grappled with, and come up with good solutions for?Anna Geller
05/05/2022, 10:03 AMthey get loaded by the Docker storage class, it'll be using cloudpickle, etc etcyou don't have to use Docker storage - Docker storage is a convenient way of packaging flow code with their dependencies but you can totally use containerized execution platform using your image build process
since our code is private, I need to host that package in a private PyPi serverYou don't have to host the package anywhere, you only need to install it in your execution platform. You can install it from setup.py rather than from pip, check this simple example 'm sure you can find a way to package your code without having to modify the Storage source code
we're considering moving almost all our code out of the individual flow .py files and into the pip-installable packageYou could do that, but the easiest would be to build a package only for code that needs to be shared across flows
Does this all sound like something people have already grappled with, and come up with good solutions for?Yes, absolutely. Usually building a package and installing it in the execution environment or Docker image solves the problem
Ben Ayers-Glassey
05/05/2022, 5:08 PMYou don't have to host the package anywhere, you only need to install it in your execution platform. You can install it from setup.py rather than from pip, check this simple exampleI see. Actually I found this file was the one which showed me what I was confused about, in particular seeing which kwargs of
Docker
it used:
docker_storage = Docker(
image_name="community",
image_tag="latest",
registry_url=f"{AWS_ACCOUNT_ID}.<http://dkr.ecr.eu-central-1.amazonaws.com|dkr.ecr.eu-central-1.amazonaws.com>",
stored_as_script=True,
path=f"/opt/prefect/flows/{FLOW_NAME}.py",
)
...so it's not using the base_image
and python_dependencies
kwargs, and furthermore build=False
is being passed to `flow.register(...)`; so basically instead of having the Docker
class build a Dockerfile behind the scenes, you're just writing your own Dockerfile by hand, giving you full freedom over what goes into it.
I'm sure you can find a way to package your code without having to modify the Storage source codeYes, it seems like it. I still think it would be useful to be able to specify flows using a dotted path instead of a file path when using the
Docker
storage class, seeing how much easier it is to package up shared code when you write the Dockerfile yourself, I don't think that's a huge issue!
Thanks again šAnna Geller
05/06/2022, 12:36 PMyou're just writing your own Dockerfile by hand, giving you full freedom over what goes into it.Exactly! You're very welcome, LMK if you have any open questions