Hi everyone! First of all happy new year everybody...
# ask-community
j
Hi everyone! First of all happy new year everybody 🙂 I feel that I don’t fully understand the Docker Storage. As far as I know, you need to serialize the flow (using cloudpickle) and when the prefect agent requests a flow run, it downloads the flow and run it. However, if you are using the Docker Storage I don’t understand why cloudpickle is still used, as I expect that the whole flow is pushed as a docker image to some repository. My question comes from a project where I use docker storage (the python 3.8 image
prefecthq/prefect:0.15.11-python3.8
), but I am using python 3.9 to register the flow to prefect cloud. It fails because I am using different python versions. Could you please clarify me how it works? Thank you very much!
a
Happy New Year, too! It works exactly as you described - by default, Prefect serializes the flow with cloudpickle and stores it within your Docker image. The Docker storage tries to make it easier for users to package their dependencies. But if you are more advanced user, you can either: • use
stored_as_script=True
in your Docker storage - here are 2 examples that show how to use this pattern • build your own image and pass it to your run config, and then use other type of storage.
j
Thank you very much! I didn’t notice the
Prefect Idioms
section in the core documentation.
store_as_script
would work for me, thanks!
Just one more question about this: using
store_as_script=False
uses cloudpickle to serialize the flow, and therefore the agent just needs to do
flow.run()
. However, if I use
store_as_script=True
, what does the agent?
from script import flow
and then
flow.run()
? In particular I am reading some environment variables at the beginning of the script that are used by the flow and the tasks. If I use
store_as_script=False
the flow run works ok, so I suppose that the tasks are using the values for env vars declared during the flow registration process (serialized in the cloudpickle object). However, if I use
store_as_script=True
, I suppose that the whole python file is executed and for each flow run the env vars are loaded from current agent environment. Is that right? Thank you in advance!
a
it’s really quite simple: • if stored_as_script=True, Prefect does nothing with it during build and reads the flow from file at runtime - see this line • if stored_as_script=False, Prefect pickles the flow to a file at registration and reads it from that pickle file at runtime - see this line the env variables should be treated the same in both
j
Ok thank you @Anna Geller! I think that I understand it right now. I am using
dotenv.load_dotenv()
to setup a specific environment from a .env file at the beginning of my flow.py file. If I build the flow and serialize it using cloudpickle, the tasks inside the flow have the correct values for declared env vars (global variables created outside the flow and tasks). In the links you gave me I checked that if using
store_as_script=False
, the flow is serialized and loaded when required by the agent reading the binary file. However, if
store_as_script=True
, the flow is loaded using this function that executes the whole python file and search for a
prefect.Flow
object. Therefore the environment variables are loaded in that moment, so I should place the .env file in the Docker image as well. Thanks again!
🙌 1