Christopher Chong Tau Teng
12/02/2021, 11:42 AM-flows/
---flow_1.py
---flow_2.py
-src/
---task_flow_1.py
---task_flow_2.py
-Dockerfile
and samples from each file:
flow_1.py
from datetime import timedelta
from prefect import Flow
from prefect.schedules import IntervalSchedule
from prefect.storage import GCS
from prefect.run_configs import DockerRun
import sys
sys.path.append('.../src')
from task_flow_1 import task_test_flow
schedule = IntervalSchedule(interval=timedelta(minutes=1))
with Flow("test-flow-1", schedule) as flow:
task_test_flow()
flow.storage = GCS(bucket="xxx")
flow.run_config = DockerRun(image="xxx/prefect:v1")
flow.register(project_name='docker-runner-01')
task_flow_1.py
import prefect
from prefect import task
import numpy as np
@task
def task_test_flow():
logger = prefect.context.get("logger")
test_arr = np.array([1, 2, 3])
<http://logger.info|logger.info>(f"{test_arr}")
Dockerfile
FROM prefecthq/prefect:latest
WORKDIR /app
ADD . .
RUN prefect backend server
Now assuming I have registered both flows with the server and they are running as expected.
One day suddenly, flow_1
breaks and I need to change task_flow_1.py
to fix the bug. I then updated the following image to v2 in flow_1.py
.
flow.run_config = DockerRun(image="xxx/prefect:v2")
I then built a new docker image v2 and pushed to xxx/prefect:v2
.
Here’s my question: before I register these 2 flows with the server, do I also need to update the image in flow_2.py
to use xxx/prefect:v2
, or can it continue to use xxx/prefect:v1
?Anna Geller
12/02/2021, 11:51 AMflow.register(project_name='docker-runner-01')
Because even when you run your flow, it will then try to register the flow at flow runtime. It would be better to remove that line from the flow file and use the CLI instead for registration:
prefect register --project project-p flow.py
#2 When you use Docker storage, then your custom flow dependencies are packaged there. So you don’t need to have those dependencies installed on Server. Therefore, the image for Server can be a different Docker image than the one used for a flow. But if your tasks are defined outside of the flow file, then they won’t exist in the GCS storage. You would then have to rebuild the docker image any time you make any change to either your tasks or flow. To make it easier you could either:
• change the image tag to latest so that it’s easier to update it on your run config
• or perhaps switching to docker storage would make more sense in your use case? This way the image would always be built individually for each flow upon registration.
Does it answer your question?
I think the blog post you referenced was addressing the problem of packaging the custom module dependencies (i.e. packages), rather than packaging flow and task code. In your use case you would always need to rebuild the image so Docker storage makes actually more sense than GCSChristopher Chong Tau Teng
12/06/2021, 3:06 AMflow_2.py
to
flow.run_config = DockerRun(image="xxx/prefect:v2")
or can it continue to use xxx/prefect:v1
?Kevin Kho
12/06/2021, 3:45 AMv1
is there is no reason to upgrade if you don’t want to change it. No need to re-register also if you just want it to use v1
.Christopher Chong Tau Teng
12/06/2021, 9:35 AM