Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi everyone! First of all happy new year everybody :slightly_smiling_face:

I feel that I don’t fully understand the Docker Storage. As far as I know, you need to serialize the flow (using cloudpickle) and when the prefect agent requests a flow run, it downloads the flow and run it. However, if you are using the Docker Storage I don’t understand why cloudpickle is still used, as I expect that the whole flow is pushed as a docker image to some repository.

My question comes from a project where I use docker storage (the python 3.8 image `prefecthq/prefect:0.15.11-python3.8`), but I am using python 3.9 to register the flow to prefect cloud. It fails because I am using different python versions. Could you please clarify me how it works? Thank you very much!

Happy New Year, too! It works exactly as you described - by default, Prefect serializes the flow with cloudpickle and stores it within your Docker image. The Docker storage tries to make it easier for users to package their dependencies. But if you are more advanced user, you can either:
• use `stored_as_script=True` in your Docker storage - <https://github.com/anna-geller/packaging-prefect-flows/tree/master/flows_no_build|here> are 2 examples that show how to use this pattern
• build your own image and pass it to your run config, and then use other type of storage. 


Thank you very much! I didn’t notice the `Prefect Idioms`  section in the core documentation. `store_as_script`  would work for me, thanks!

Just one more question about this: using `store_as_script=False` uses cloudpickle to serialize the flow, and therefore the agent just needs to do `flow.run()` . However, if I use `store_as_script=True` , what does the agent? `from script import flow`  and then `flow.run()` ?

In particular I am reading some environment variables at the beginning of the script that are used by the flow and the tasks. If I use `store_as_script=False` the flow run works ok, so I suppose that the tasks are using the values for env vars declared during the flow registration process (serialized in the cloudpickle object). However, if I use `store_as_script=True` , I suppose that the whole python file is executed and for each flow run the env vars are loaded from current agent environment. Is that right?

Thank you in advance!

it’s really quite simple:
• if stored_as_script=True, Prefect does nothing with it during build and reads the flow from file at runtime - see <https://github.com/PrefectHQ/prefect/blob/d527ab2af11956d0564efbd8543989eab5a981ee/src/prefect/storage/docker.py#L264|this line>
• if stored_as_script=False, Prefect pickles the flow to a file at registration and reads it from that pickle file at runtime - see <https://github.com/PrefectHQ/prefect/blob/d527ab2af11956d0564efbd8543989eab5a981ee/src/prefect/storage/docker.py#L469|this line> 
the env variables should be treated the same in both

Ok thank you <@U02H1A95XDW>! I think that I understand it right now.

I am using `dotenv.load_dotenv()` to setup a specific environment from a .env file at the beginning of my flow.py file. If I build the flow and serialize it using cloudpickle, the tasks inside the flow have the correct values for declared env vars (global variables created outside the flow and tasks).

In the links you gave me I checked that if using `store_as_script=False` , the flow is serialized and loaded when required by the agent reading the binary file. However, if `store_as_script=True` , the flow is loaded using <https://github.com/PrefectHQ/prefect/blob/d527ab2af11956d0564efbd8543989eab5a981ee/src/prefect/utilities/storage.py#L53|this function> that executes the whole python file and search for a `prefect.Flow`  object. Therefore the environment variables are loaded in that moment, so I should place the .env file in the Docker image as well.

Thanks again!