Billy McMonagle
10/13/2021, 10:48 PMBilly McMonagle
10/13/2021, 10:48 PMBilly McMonagle
10/13/2021, 11:22 PMtest_flow
), each containing a Dockerfile, a requirements.txt
file, and the flow itself. At the top level is the shared registration code, register.py
.Billy McMonagle
10/13/2021, 11:23 PMfrom test_flow import myflow
PROJECT_NAME = "my-project"
if __name__ == "__main__":
print(f"Registering flow {myflow.name}")
myflow.register(
project_name=PROJECT_NAME, labels=["my-label"],
)
test_flow/myflow.py:
from prefect import Flow
from prefect.utilities.tasks import task
import pandas
@task
def my_task(arg):
# pretend like this does something with pandas
print(f"my arg is: {arg}")
with Flow("My Test Flow") as flow:
my_task(arg="foo")
my_task(arg="bar")
Billy McMonagle
10/13/2021, 11:23 PMBilly McMonagle
10/13/2021, 11:24 PMregister.py
yields the following result...
❯ python register.py
Traceback (most recent call last):
File "register.py", line 1, in <module>
from test_flow import myflow
File "[...]/flows/test_flow/myflow.py", line 4, in <module>
import pandas
ModuleNotFoundError: No module named 'pandas'
Billy McMonagle
10/13/2021, 11:26 PMBilly McMonagle
10/13/2021, 11:27 PMKevin Kho
import
statements inside the tasks. This will defer imports upon execution and you can register without them. The second option is to store your flow as a script (think S3, Github storages where there is no serialization). I know you are using Docker storage, but you can still stored_as_script=True
inside a Docker storage and this might not serialize the Python script so those dependencies won’t be needed during build time.
As to good practice or not, a lot of users defer their import because they have CI/CD pipelines with a specified build image and that pipeline may not have the requirements, but the execution environment does, so some people do indeed to this. Does that help?emre
10/14/2021, 7:17 AMBilly McMonagle
10/14/2021, 2:38 PMimport
inside tasks would work (feels weird, but probably fine).
I don't think stored_as_script=True
solves my problem because I still have to import the flow object, which obviously would execute import
statements in the flow definition file. Maybe you just meant this is a useful option in conjunction with the aforementioned "defer imports" advice.Billy McMonagle
10/14/2021, 2:46 PMkey
but not set the local_script_path
, meaning that I would simply upload the file myself rather than allow the register script to do it (I will not make authenticated aws sdk calls from inside the docker image as it is being built).Billy McMonagle
10/14/2021, 2:47 PMemre
10/14/2021, 6:01 PMmain.py serialize
and main.py register
, which calls flow.serialize and flow.register, almost nothing in terms of configuration.
Its just a simple way to make the most out of the docker image that I know can set up my flow, without any of the issues you are facing.Billy McMonagle
10/14/2021, 6:06 PMBilly McMonagle
10/14/2021, 6:06 PMBilly McMonagle
10/18/2021, 6:07 PMBilly McMonagle
10/18/2021, 6:09 PMRUN prefect auth login --key $PREFECT_TOKEN
RUN python register.py --flow $APP_HOME/flows/myflow.py
And my storage looks like this (from register.py
):
Docker(
path=flow_path,
stored_as_script=True,
image_name="$accountid.dkr.ecr.$<http://region.amazonaws.com/$image|region.amazonaws.com/$image>",
image_tag="$tag",
)
Kevin Kho
Billy McMonagle
10/18/2021, 6:10 PMBilly McMonagle
10/18/2021, 6:10 PMKevin Kho
Billy McMonagle
10/18/2021, 6:18 PMDocker storage is missing required fields image_name and image_tag
Billy McMonagle
10/18/2021, 6:19 PMflow.storage.add_flow(flow)
, or else I get this error:
Failed to load and execute Flow's environment: ValueError('Flow is not contained in this Storage')This is because I pass
build=False
, since the image is already built.