Alexandru Sicoe
03/22/2021, 11:11 PMChris White
Kyle Moon-Wright
03/22/2021, 11:53 PMdev
and the other with prod
), so we can submit Flow Runs to each environment by matching up labels with those corresponding Agents polling our Cloud tenant. If you configure your Flow labels in your Storage
or Run_config
with the schedules you mentioned, these will be submitted to each environment respectively (assuming you have two Agents with each label) with the dynamic values you’ve configured per schedule.Kyle Moon-Wright
03/22/2021, 11:53 PMAlexandru Sicoe
03/23/2021, 8:38 AMrepo1/
Dockerfile
pkg1/
mod1_1.py
mod1_2.py
pkg2/
mod2_1.py
mod2_2.py
repo2/
Dockerfile
pkg3/
mod3_1.py
mod3_2.py
mod3_3.py
...
We have 2 environments: dev
and prod
.
Most of these jobs need to be scheduled on the same schedule
for both dev
and prod
.
How would we bring all these jobs into Prefect in a uniform way?
For starters we were thinking of applying the pattern suggested here: https://docs.prefect.io/orchestration/recipes/configuring_storage.html#configuring-docker-storage
We were thinking of having another Dockerfile in each repo tailored for Prefect call it Dockerfile-prefect
We would also have a python script at the top level in each repo, one for each package, that would create the flow for that package and register it. It would roughly do:
1. Create a task for every job in that package's every module.
2. Add them to a Flow.
3. The Flow would use the Docker Storage pointing to the Dockerfile-prefect
file and using the files
keyword to point to the module files.
Then we would apply the pattern here: https://docs.prefect.io/core/concepts/schedules.html#varying-parameter-values
4. The flow would also have a Schedule with 2 identical clocks, one for each environment dev
and prod
but these Clocks will obviously have different parameter_defaults
configs.
5. Call register on the flow
At step 4 we have a problem, these configs being stuff like database hostnames, connection strings etc .... for either dev
and prod
, how do we load them dynamically for the various environments? Do we load them from env vars that would live in CI that ultimately will have to run the prefect_*.py
files in the associated repo?
E.g. steps 1-5 for repo1
for pkg1
above would produce a prefect_pkg1.py
script like:
import datetime
import os
from pkg1 import mod1_1, mod1_2
from prefect.schedules import clocks, Schedule
from prefect.storage import Docker
now = datetime.datetime.utcnow()
# Create our Docker storage object
storage = Docker(registry_url="<http://gcr.io/dev/|gcr.io/dev/>",
dockerfile="../Dockerfile-prefect")
# Create our Schedule
clock1 = clocks.IntervalClock(start_date=now,
interval=datetime.timedelta(hours=1),
parameter_defaults={
"db_server_name": os.getenv("DEV_DB_SERVER_NAME")
}) # there will be more
clock2 = clocks.IntervalClock(start_date=now,
interval=datetime.timedelta(hours=1),
parameter_defaults={
"db_server_name": os.getenv("DEV_DB_SERVER_NAME")
}) # there will be more
schedule = Schedule(clocks=[clock1, clock2])
@task
def task1():
mod1_1.execute()
@task
def task2():
mod1_2.execute()
flow = Flow("flow_pkg1", tasks=[task1, task2], schedule=schedule, storage=storage)
flow.register()
So the structure for repo1
would become:
repo1/
Dockerfile
Dockerfile-prefect
pkg1/
mod1_1.py
mod1_2.py
pkg2/
mod2_1.py
mod2_2.py
prefect_pkg1.py
prefect_pkg2.py
And then we would need to configure Github actions to execute each script that starts with "prefect_" at the top of the repo.
Does that look ok? Is there a better pattern?
I apologise for the very long message .... would greatly appreciate any feedback!Alexandru Sicoe
03/23/2021, 8:42 AM