hey all i m in the process of migrating over and i m trying Prefect Community #ask-community

hey all, i'm in the process of migrating over and ...

Constantino Schillebeeckx

08/29/2023, 3:03 PM

hey all, i'm in the process of migrating over and i'm trying to wrap my head around the CD part of it all. my use case is that I can have hundreds of flows, each with potentially a different schedules which should eventually get deployed to prod or staging workspaces. my current thinking is that I can put together a prefect.yaml and then during CD I can loop over my Flows:

Copy code

for flow in $flows; do prefect deploy $flow -n prod; done

How do I define the (CRON) schedule for each of those flows; the decorator doesn't have it as an argument. Do I need to define a deployment in the Flow file and do a

build_from_flow

redsquare

08/29/2023, 3:13 PM

we have a deployment.py per flow, we just iterate all flow folders (if they have changed) and deploy

Sanz Al

08/29/2023, 3:16 PM

we have one sh file with lines like this for each flow

Copy code

prefect deployment build -n test_flow -p prod -q default -o ~/deployments/test_flow -t monitoring --skip-upload --cron "55 6 * * */3" ~/prefect/flows/monitoring/test_flow.py:test_flow -a

Constantino Schillebeeckx

08/29/2023, 3:28 PM

this is exactly the kinda thing i want to avoid; when users author a flow, I don't want them to worry about the deployment part. i don't want them to both write the flow and update an

sh

file somewhere else - I don't think it's a great user experience. ideally all the deployment details of a flow are abstracted away from the user.

Nate

08/29/2023, 3:35 PM

@Constantino Schillebeeckx there's a few different ways to do this

when users author a flow, I don't want them to worry about the deployment part.

agreed! this is one of the things we were thinking about when developing the

prefect deploy

prefect.yaml

deployment UX! but it does sound like in your case, someone will have to pick a schedule if you don't want to set a schedule at deployment time, you don't have to. the deployment creator can pop into the UI after

prefect deploy

and click the buttons to get a custom schedule. if you want it be declarative, you could have

definitions

in your

prefect.yaml

for schedules that a user could select from an attach to their new deployment (like this)

Constantino Schillebeeckx

08/29/2023, 3:43 PM

ideally we're doing nothing in the UI 🙂 we want all things IAC! if only there were a way to extend flow attributes 😉

redsquare

08/29/2023, 3:44 PM

The flow is orthogonal to the deployment(s) though

upvote 1

Nate

08/29/2023, 3:45 PM

we've intentionally separated concerns in prefect 2 where flows know as little about their deployments as possible, unlike prefect 1 where flow was this massive object that was also the thing you deployed

Nate

08/29/2023, 3:47 PM

my second suggestion above should work for you

Emerson Franks

08/29/2023, 3:47 PM

not sure if this helps but we have 2 general types of flows. the first set are fully locked in code, including the schedules. these are checked in and deployed via cd. the second type are generated. these are flows for things like Fivetran, dbt cli or Databricks. these flows use a template and a JSON config file to build each flow .py file during CD and then deploy. this means for our 100+ Fivetran flows, each has it's own schedule that is defined in a JSON config which can be changed by our data team at any time and updated by simply running our dispatcher. this has net in ~20 static fully checked in flows and several hundred generated flows that the only thing that's checked in is the template

Constantino Schillebeeckx

08/29/2023, 3:48 PM

yep I get that @Nate; as I put myself in the shoes of a Flow author, I'm thinking they don't care exactly where it runs (i.e. DATAOPS does all the AWS ECS, worker, pool, etc setup). what they do care about is what (their code) runs and when.

Nate

08/29/2023, 3:50 PM

gotcha, so you the

what

sounds like the flow they just wrote, and then

when

is the schedule

defintion

that you have for them to select from the list and attach to their new entry in the

prefect.yaml

- do you have a problem with that deployment ux?

Constantino Schillebeeckx

08/29/2023, 3:51 PM

yep, as soon as they need a new schedule, they ping me and are like, I need a cron for 12:43 🙂

Constantino Schillebeeckx

08/29/2023, 3:51 PM

@Emerson Franks that's kinda the route I'm thinking, especially the template and JSON config - is there more you can share with me?

Nate

08/29/2023, 3:59 PM

yep, as soon as they need a new schedule, they ping me and are like, I need a cron for 12:43

hmm its fairly straightforward to define a schedule - just adding this would be the declarative way if you're interested

Copy code

schedule:
   cron: 43 12 * * *

🙏 1

Emerson Franks

08/29/2023, 4:08 PM

At a high level, our CD, runs a dispatcher that internally leverages what we call a FlowFactory that knows how to build flows. That factory is more or less doing what you mentioned above in that it knows how to call

Deployment.build_from_flow

to build each flow from code, rather than .yaml. The output of the factory method is a flow w/ a

.apply

command that our dispatcher invokes. So every flow that we have ends up at this set of code:

Copy code

def deploy(self):
        deployment = self.flow_factory.generate_flow_deployment(
            self.flow_name,
            self.flow_logic,
            self.is_schedule_active,
            self.flow_interval_in_seconds,
            description=self.description,
            flow_parameters=self.flow_parameters,
            extra_tags=self.tags,
            schedule_anchor=self.schedule_anchor,
            job_ttl=self.job_ttl)
        deployment.apply()

For the generated flows themselves, we have checked in template files that are leveraged by generators to create the .py files during CD. The templates are really 'simple' for the most part and just have parameters that get filled during the CD using values from the JSON config files. So for Fivetran, we have something like this:

Copy code

# THIS CODE IS GENERATED BY THE FIVETRAN FLOW CREATOR, DO NOT MANUALLY UPDATE
from prefect import flow, get_run_logger
from prefect.blocks.system import Secret

from fivetran_provider import FivetranProvider


@flow(timeout_seconds={timeout_seconds})
def {flow_name}():
    logger = get_run_logger()
    fivetran_key = Secret.load('five-tran-key')
    fivetran_secret = Secret.load('five-tran-secret')
    fivetran_provider = FivetranProvider(fivetran_key.get(), fivetran_secret.get(), logger)

    <http://logger.info|logger.info>('running sync for connector_name: {connector_name}')

    fivetran_provider.sync_connector('{connector_id}', {timeout_seconds}-60)


if __name__ == "__main__":
    {flow_name}()

This code is filled using a method like this:

Copy code

def create_flow_file_from_template(self):
        with open('fivetran_flow_template.py', 'r') as flow_file_template:
            flow_file = flow_file_template.read().format(
                flow_name=self.flow_name,
                timeout_seconds=self.five_tran_flow_config.timeout_seconds,
                connector_name=self.five_tran_flow_config.connector_name,
                connector_id=self.five_tran_flow_config.connector_id
            )

        with open(f'{self.flow_name}.py', 'w') as py_file:
            py_file.write(flow_file)

You then end up needing to import the modules before you can do any of the deployment. That is done using `importlib`and calling

import_module

then finally calling

getattr

. From there, the code is just pushed into our FlowFactory as described above.

Nate

08/29/2023, 4:16 PM

because you mentioned

i'm in the process of migrating over

I'll say that I'd recommend

prefect.yaml

prefect deploy

over infra / storage block &

Deployment.build_from_flow

deployment ux because with the latter you cannot leverage workers fully (

pull

step for example) and eventually the block/agent -based deployments will be no longer be our main recommendation as a pretty heavy user on both sides (deployment creator / work pool creator) i'll say that there's nothing I can think of that you can do with

build_from_flow

+ python that you can't do with

prefect.yaml

+ custom deployment steps (which can be arbitrary python)

Emerson Franks

08/29/2023, 4:19 PM

That's very unfortunate to hear about

build_from_flow

being deprecated. Most of my team really despises yaml. I'll definitely have to follow up with our AM as this would be a pretty big show stopper for us.

💯 1

Constantino Schillebeeckx

08/29/2023, 4:19 PM

@Nate i've been wondering about the

prefect.yaml

Deployment.build_from_flow

UX - I'm guessing the former is the "new" way of doing things, and the latter is a bit older? I don't have much context for feature development in 2.0 - it's made getting up to speed a bit more difficult as the docs don't really highlight the differences very well IMHO

Nate

08/29/2023, 4:21 PM

the former is the "new" way of doing things, and the latter is a bit older?

correct @Emerson Franks

Deployment.build_from_flow

will be around for a while, and by the time its deprecated, we should have an analogous python interface for people who dont want to use yaml

🙌 1

Constantino Schillebeeckx

08/29/2023, 4:24 PM

as a pretty heavy user on both sides (deployment creator / work pool creator)

@Nate what exactly do you mean by work pool creator?

Nate

08/29/2023, 4:25 PM

out of curiosity @Emerson Franks and @redsquare - are you fans of build_from_flow for a specific reason (other than disliking yaml in general)? have you explored the new UX at all? (i do understand there's been a bit of whiplash around deployment ux)

Nate

08/29/2023, 4:26 PM

what exactly do you mean by work pool creator?

dev ops person that supports deployment authors, for example, I setup a k8s cluster for my team who shouldnt have to worry about infra

redsquare

08/29/2023, 4:26 PM

@Nate not really explored - time poor currently and everything is working well

🙏 1

redsquare

08/29/2023, 4:29 PM

We spent a good while perfecting our CI/CD across three environments - would hate to have to revisit it without a good reason

💯 1

redsquare

08/29/2023, 4:30 PM

Still dont see a benefit with a worker over agent - we run everything in k8's

Emerson Franks

08/29/2023, 4:31 PM

I think we're similar but also, I come from a C#/.NET world and really prefer to write "real" code. Using

build_from_flow

allows me to stay 100% in Python (which I guess is real code 😉) and not have to worry about yaml. We, of course, have to use yaml for things like our CI/CD pipeline and k8s deployments but these are really static and after they are setup, we can wash our hands of yaml 🙂 +1 to not seeing why I would ever run a worker instead of k8s.

redsquare

08/29/2023, 4:31 PM

c# too here 🙂

Nate

08/29/2023, 4:39 PM

worker instead of k8s

its not worker instead of a specific infra, its worker instead of agent workers/agents generally support the same infras. workers are just strongly typed (instead of agents which try to submit work anywhere) and have extra capabilities (like executing arbitrary job setup in a

pull

step) which may not be relevant for you all. but point taken

would hate to have to revisit it without a good reason

i know its a pain to migrate (i have lots of deployments to manage too 🙂) - anyways, if/when you wanna switch (again, a python interface will exist before you have to worry about it), happy to help smooth over rough edges

Constantino Schillebeeckx

08/29/2023, 5:02 PM

@Nate what's the best way to get notified in the future of these largish new features?

Nate

08/29/2023, 5:05 PM

i wouldnt expect large changes around deployment UX, it has been a lot of work to get where we are now with workers and we understand its a large ask to go agents -> workers. we're planning on enhancements / fixes in the future around this that said, release notes is the best place for all the info, #CKNSX5WG3 is the place for big stuff

🙌 1

Constantino Schillebeeckx

08/29/2023, 5:11 PM

@Nate above you mentioned custom deployment steps, I'm guessing you meant this exactly. Is there another place for more documentation regarding this? For example I can't tell in which action that runs (I guess any?) or how that docker image works (is it distinct to the one that might be defined in

build

Nate

08/29/2023, 5:20 PM

all of the deployment steps are just fully qualified function names, so

prefect_docker.deployments.steps.build_docker_image

lives here in general, these steps like

build_docker_image

push_to_s3

are defined / documented in their service's collection (e.g.

prefect-docker

prefect-aws

) and with the exception of the

pull

step, they're optional, if you want to build an image / push your code with other CI/CD, thats fine! the worker just needs to know where to get it at runtime

Nate

08/29/2023, 5:23 PM

and by

custom deployment steps

i mean that I could have a file called

my_steps.py

and write

Copy code

async def my_fancy_step(arg1: str, arg2: dict):
   # do whatever your step should do

and in my

prefect.yaml

have

Copy code

- my_steps.my_fancy_step:
  arg1: "foo"
  arg2: 
    key: val

and

prefect deploy

will run that for you at deployment time so long as you have

my_steps

available in the runtime

snap point 1

Constantino Schillebeeckx

08/29/2023, 5:25 PM

alrighty, I think I understand; thank you for the support!

Nate

08/29/2023, 5:26 PM

sure! feel free to ask further questions as they come up

Marty Ko

09/07/2023, 6:59 AM

I have a different user case - So our users are non-technical business users but knows a little of JSON / cron. I want them to take control of the schedule and job variables. But it seems to me that every time I run

prefect deploy

all configurations that done from the UI will get over-written from the prefect.yaml. Is there anyway to avoid this?

5 Views

Open in Slack

Previous Next