hey all, i'm in the process of migrating over and ...
# ask-community
c
hey all, i'm in the process of migrating over and i'm trying to wrap my head around the CD part of it all. my use case is that I can have hundreds of flows, each with potentially a different schedules which should eventually get deployed to prod or staging workspaces. my current thinking is that I can put together a prefect.yaml and then during CD I can loop over my Flows:
Copy code
for flow in $flows; do prefect deploy $flow -n prod; done
How do I define the (CRON) schedule for each of those flows; the decorator doesn't have it as an argument. Do I need to define a deployment in the Flow file and do a
build_from_flow
?
r
we have a deployment.py per flow, we just iterate all flow folders (if they have changed) and deploy
s
we have one sh file with lines like this for each flow
Copy code
prefect deployment build -n test_flow -p prod -q default -o ~/deployments/test_flow -t monitoring --skip-upload --cron "55 6 * * */3" ~/prefect/flows/monitoring/test_flow.py:test_flow -a
c
this is exactly the kinda thing i want to avoid; when users author a flow, I don't want them to worry about the deployment part. i don't want them to both write the flow and update an
sh
file somewhere else - I don't think it's a great user experience. ideally all the deployment details of a flow are abstracted away from the user.
n
@Constantino Schillebeeckx there's a few different ways to do this
when users author a flow, I don't want them to worry about the deployment part.
agreed! this is one of the things we were thinking about when developing the
prefect deploy
/
prefect.yaml
deployment UX! but it does sound like in your case, someone will have to pick a schedule if you don't want to set a schedule at deployment time, you don't have to. the deployment creator can pop into the UI after
prefect deploy
and click the buttons to get a custom schedule. if you want it be declarative, you could have
definitions
in your
prefect.yaml
for schedules that a user could select from an attach to their new deployment (like this)
c
ideally we're doing nothing in the UI 🙂 we want all things IAC! if only there were a way to extend flow attributes 😉
r
The flow is orthogonal to the deployment(s) though
upvote 1
n
we've intentionally separated concerns in prefect 2 where flows know as little about their deployments as possible, unlike prefect 1 where flow was this massive object that was also the thing you deployed
my second suggestion above should work for you
e
not sure if this helps but we have 2 general types of flows. the first set are fully locked in code, including the schedules. these are checked in and deployed via cd. the second type are generated. these are flows for things like Fivetran, dbt cli or Databricks. these flows use a template and a JSON config file to build each flow .py file during CD and then deploy. this means for our 100+ Fivetran flows, each has it's own schedule that is defined in a JSON config which can be changed by our data team at any time and updated by simply running our dispatcher. this has net in ~20 static fully checked in flows and several hundred generated flows that the only thing that's checked in is the template
c
yep I get that @Nate; as I put myself in the shoes of a Flow author, I'm thinking they don't care exactly where it runs (i.e. DATAOPS does all the AWS ECS, worker, pool, etc setup). what they do care about is what (their code) runs and when.
n
gotcha, so you the
what
sounds like the flow they just wrote, and then
when
is the schedule
defintion
that you have for them to select from the list and attach to their new entry in the
prefect.yaml
- do you have a problem with that deployment ux?
c
yep, as soon as they need a new schedule, they ping me and are like, I need a cron for 12:43 🙂
@Emerson Franks that's kinda the route I'm thinking, especially the template and JSON config - is there more you can share with me?
n
yep, as soon as they need a new schedule, they ping me and are like, I need a cron for 12:43
hmm its fairly straightforward to define a schedule - just adding this would be the declarative way if you're interested
Copy code
schedule:
   cron: 43 12 * * *
🙏 1
e
At a high level, our CD, runs a dispatcher that internally leverages what we call a FlowFactory that knows how to build flows. That factory is more or less doing what you mentioned above in that it knows how to call
Deployment.build_from_flow
to build each flow from code, rather than .yaml. The output of the factory method is a flow w/ a
.apply
command that our dispatcher invokes. So every flow that we have ends up at this set of code:
Copy code
def deploy(self):
        deployment = self.flow_factory.generate_flow_deployment(
            self.flow_name,
            self.flow_logic,
            self.is_schedule_active,
            self.flow_interval_in_seconds,
            description=self.description,
            flow_parameters=self.flow_parameters,
            extra_tags=self.tags,
            schedule_anchor=self.schedule_anchor,
            job_ttl=self.job_ttl)
        deployment.apply()
For the generated flows themselves, we have checked in template files that are leveraged by generators to create the .py files during CD. The templates are really 'simple' for the most part and just have parameters that get filled during the CD using values from the JSON config files. So for Fivetran, we have something like this:
Copy code
# THIS CODE IS GENERATED BY THE FIVETRAN FLOW CREATOR, DO NOT MANUALLY UPDATE
from prefect import flow, get_run_logger
from prefect.blocks.system import Secret

from fivetran_provider import FivetranProvider


@flow(timeout_seconds={timeout_seconds})
def {flow_name}():
    logger = get_run_logger()
    fivetran_key = Secret.load('five-tran-key')
    fivetran_secret = Secret.load('five-tran-secret')
    fivetran_provider = FivetranProvider(fivetran_key.get(), fivetran_secret.get(), logger)

    <http://logger.info|logger.info>('running sync for connector_name: {connector_name}')

    fivetran_provider.sync_connector('{connector_id}', {timeout_seconds}-60)


if __name__ == "__main__":
    {flow_name}()
This code is filled using a method like this:
Copy code
def create_flow_file_from_template(self):
        with open('fivetran_flow_template.py', 'r') as flow_file_template:
            flow_file = flow_file_template.read().format(
                flow_name=self.flow_name,
                timeout_seconds=self.five_tran_flow_config.timeout_seconds,
                connector_name=self.five_tran_flow_config.connector_name,
                connector_id=self.five_tran_flow_config.connector_id
            )

        with open(f'{self.flow_name}.py', 'w') as py_file:
            py_file.write(flow_file)
You then end up needing to import the modules before you can do any of the deployment. That is done using `importlib`and calling
import_module
then finally calling
getattr
. From there, the code is just pushed into our FlowFactory as described above.
n
because you mentioned
i'm in the process of migrating over
I'll say that I'd recommend
prefect.yaml
/
prefect deploy
over infra / storage block &
Deployment.build_from_flow
deployment ux because with the latter you cannot leverage workers fully (
pull
step for example) and eventually the block/agent -based deployments will be no longer be our main recommendation as a pretty heavy user on both sides (deployment creator / work pool creator) i'll say that there's nothing I can think of that you can do with
build_from_flow
+ python that you can't do with
prefect.yaml
+ custom deployment steps (which can be arbitrary python)
e
That's very unfortunate to hear about
build_from_flow
being deprecated. Most of my team really despises yaml. I'll definitely have to follow up with our AM as this would be a pretty big show stopper for us.
💯 1
c
@Nate i've been wondering about the
prefect.yaml
vs
Deployment.build_from_flow
UX - I'm guessing the former is the "new" way of doing things, and the latter is a bit older? I don't have much context for feature development in 2.0 - it's made getting up to speed a bit more difficult as the docs don't really highlight the differences very well IMHO
n
the former is the "new" way of doing things, and the latter is a bit older?
correct @Emerson Franks
Deployment.build_from_flow
will be around for a while, and by the time its deprecated, we should have an analogous python interface for people who dont want to use yaml
🙌 1
c
as a pretty heavy user on both sides (deployment creator / work pool creator)
@Nate what exactly do you mean by work pool creator?
n
out of curiosity @Emerson Franks and @redsquare - are you fans of build_from_flow for a specific reason (other than disliking yaml in general)? have you explored the new UX at all? (i do understand there's been a bit of whiplash around deployment ux)
what exactly do you mean by work pool creator?
dev ops person that supports deployment authors, for example, I setup a k8s cluster for my team who shouldnt have to worry about infra
r
@Nate not really explored - time poor currently and everything is working well
🙏 1
We spent a good while perfecting our CI/CD across three environments - would hate to have to revisit it without a good reason
💯 1
Still dont see a benefit with a worker over agent - we run everything in k8's
e
I think we're similar but also, I come from a C#/.NET world and really prefer to write "real" code. Using
build_from_flow
allows me to stay 100% in Python (which I guess is real code 😉) and not have to worry about yaml. We, of course, have to use yaml for things like our CI/CD pipeline and k8s deployments but these are really static and after they are setup, we can wash our hands of yaml 🙂 +1 to not seeing why I would ever run a worker instead of k8s.
r
c# too here 🙂
n
worker instead of k8s
its not worker instead of a specific infra, its worker instead of agent workers/agents generally support the same infras. workers are just strongly typed (instead of agents which try to submit work anywhere) and have extra capabilities (like executing arbitrary job setup in a
pull
step) which may not be relevant for you all. but point taken
would hate to have to revisit it without a good reason
i know its a pain to migrate (i have lots of deployments to manage too 🙂) - anyways, if/when you wanna switch (again, a python interface will exist before you have to worry about it), happy to help smooth over rough edges
c
@Nate what's the best way to get notified in the future of these largish new features?
n
i wouldnt expect large changes around deployment UX, it has been a lot of work to get where we are now with workers and we understand its a large ask to go agents -> workers. we're planning on enhancements / fixes in the future around this that said, release notes is the best place for all the info, #CKNSX5WG3 is the place for big stuff
🙌 1
c
@Nate above you mentioned custom deployment steps, I'm guessing you meant this exactly. Is there another place for more documentation regarding this? For example I can't tell in which action that runs (I guess any?) or how that docker image works (is it distinct to the one that might be defined in
build
?)
n
all of the deployment steps are just fully qualified function names, so
prefect_docker.deployments.steps.build_docker_image
lives here in general, these steps like
build_docker_image
or
push_to_s3
are defined / documented in their service's collection (e.g.
prefect-docker
prefect-aws
) and with the exception of the
pull
step, they're optional, if you want to build an image / push your code with other CI/CD, thats fine! the worker just needs to know where to get it at runtime
and by
custom deployment steps
i mean that I could have a file called
my_steps.py
and write
Copy code
async def my_fancy_step(arg1: str, arg2: dict):
   # do whatever your step should do
and in my
prefect.yaml
have
Copy code
- my_steps.my_fancy_step:
  arg1: "foo"
  arg2: 
    key: val
and
prefect deploy
will run that for you at deployment time so long as you have
my_steps
available in the runtime
snap point 1
c
alrighty, I think I understand; thank you for the support!
n
sure! feel free to ask further questions as they come up
m
I have a different user case - So our users are non-technical business users but knows a little of JSON / cron. I want them to take control of the schedule and job variables. But it seems to me that every time I run
prefect deploy
all configurations that done from the UI will get over-written from the prefect.yaml. Is there anyway to avoid this?