In Prefect 2 you could use an object oriented approach to ha Prefect Community #ask-community

In Prefect 2, you could use an object-oriented app...

Austin Weisgrau

11/20/2024, 7:32 PM

In Prefect 2, you could use an object-oriented approach to handling deployments. with the Deployment class you could .apply() or .delete() a deployment. In Prefect 3, it seems like you can only run a Flow.deploy() method to push a deployment to Prefect Cloud, but there's no object-oriented interface. the Deployment class is removed, and you have to implement one yourself if you want that kind of interface. Am I wrong here? This seems like a big regression.

Nate

11/20/2024, 7:37 PM

flow.from_source(..).deploy(..)

is the analogue to

Deployment.build_from_flow(...).apply(...)

which creates the deployment on the server are you saying its harder to delete deployments now? there's a client method

delete_deployment

otherwise can you explain whats harder to do now?

Nate

11/20/2024, 7:37 PM

https://docs.prefect.io/v3/resources/upgrade-agents-to-workers

Austin Weisgrau

11/20/2024, 7:38 PM

oh nice that client method is helpful

Nate

11/20/2024, 7:38 PM

yeah the delete method on

Deployment

was just basically a passthrough to that method

Austin Weisgrau

11/20/2024, 7:39 PM

yeah just in general in prefect 2 we are using a very object-oriented approach in CICD to compare deployments in our codebase vs in the cloud, so we could have a declarative approach to maintaining our cloud deployments with the exclusively functional API in prefect 3, we'll basically have to write the whole object layer ourselves that prefect 2 provided out of the box

Nate

11/20/2024, 7:42 PM

how do you mean this?

exclusively functional API in prefect 3

in general we have 2 interfaces for creating deployments (since mid prefect 2.x when we switched to workers and deprecated

build_from_flow

) • prefect.yaml (examples) •

.deploy

(examples)

Austin Weisgrau

11/20/2024, 7:43 PM

prefect.yaml offers a declarative approach yes, but in prefect 2 we had an in-python declarative approach using the Deployment class

Nate

11/20/2024, 7:46 PM

that link i shared above should show how to map between these methods

flow.from_source(..).deploy(..)

is the analogue to

Deployment.build_from_flow(...).apply(...)

the

Deployment

class is coupled to agents, which have been superseded by workers over the last couple years. the kwargs you passed into

build_from_flow

can generally be passed to

.deploy

(with the exception of infra/storage blocks, which are superseded by work pools and

from_source

/ pull steps respectively)

Krishnan Chandra

11/20/2024, 7:47 PM

IDK if this is the “object-oriented” approach that you’re looking for but we use the

RunnerDeployment

object along with the

deploy

method. Code example: `flows.py`:

Copy code

# Flow code omitted

def create_deployments() -> list[RunnerDeployment]:
    foo_deployment = RunnerDeployment.from_entrypoint(
        "flows:foo",
        name="foo",
    )
    return [foo_deployment]

`ci_deploy.py`:

Copy code

from prefect import deploy

from flows import create_deployments

    private_deployment_ids = deploy(
        *create_deployments(),
        work_pool_name=settings.prefect_private_work_pool_name,
        image=os.environ["DOCKER_IMAGE"],
        build=False,
        push=False,
        print_next_steps_message=True,
    )
    <http://logger.info|logger.info>("Private deployments: {}", private_deployment_ids)

It’s a bit rough but I hope it helps to understand the basic idea

🙏 1

Austin Weisgrau

11/20/2024, 7:47 PM

ahh word RunnerDeployment might be what I want

Austin Weisgrau

11/20/2024, 7:48 PM

thank you!

Krishnan Chandra

11/20/2024, 7:48 PM

Basically the way it works is that we have each flow file define its own deployments, then you import those in the common

ci_deploy.py

file which can be run in CI

👍 1

Austin Weisgrau

11/20/2024, 9:07 PM

Hm I'm having a problem here where the RunnerDeployment infers the

path

attribute from the local filesystem, and contrary to the documentation it can't be overwritten by passing

path

as an initialization variable

Austin Weisgrau

11/20/2024, 9:09 PM

So even if the

entrypoint

is correctly specified, the deployment will still try and find the path relative to the absolute path of the filesystem where

deployment.apply()

was run, which won't work if the local filesystem is different than the production filesystem

Nate

11/20/2024, 9:14 PM

sorry to repeat myself, but I would recommend using

from_source

and

.deploy()

and if you really need a class, you could pretty easily make a class that uses these methods under the hood I wouldn't recommend using

RunnerDeployment

directly because if you look at the implementation of

from_source

and

deploy

you'll see we handle a lot of the cases associated with different storage options there, so you'd be responsible for re-implementing that if you used

RunnerDeployment

directly post-agents, you shouldnt have to set a

path

. like these examples show, if you're using the pythonic interface, you set

source

to the repo or bucket or whatever you have, and then you set

entrypoint

relative to that source

Nate

11/20/2024, 9:15 PM

this▾

might be a good resource if you haven't spent time using workers yet

Austin Weisgrau

11/20/2024, 9:16 PM

we're using a worker that runs in a persistent AWS ECS service in a docker image built on our flows repo, so we don't actually need code storage defined at all really

Austin Weisgrau

11/20/2024, 9:16 PM

already been using a worker for a long time

Nate

11/20/2024, 9:19 PM

gotcha, then yeah you don't need from_source, but instead to set an image like this. which you can build/push during .deploy or not via

build

push

kwargs

Austin Weisgrau

11/20/2024, 10:27 PM

I feel confused about why I need to specify an image or a remote storage location. The worker is already on a docker image with the flows available on the local filesystem.

Austin Weisgrau

11/20/2024, 10:27 PM

The flows are always already implicitly available to the worker - they don't need to be downloaded from somewhere else

Austin Weisgrau

11/20/2024, 10:28 PM

In our Prefect 2 worker set ups, this works without needing to pull in remote storage

Nate

11/20/2024, 10:29 PM

there's a step for that situation, which is this one so you don't have to pull the code, but just set the working directory where your flow code is

Austin Weisgrau

11/20/2024, 10:31 PM

ah is there an equivalent way to do this in the python API? I'm experimenting with flow.deply() right now

Austin Weisgrau

11/20/2024, 10:31 PM

I'd prefer not to need to migrate our whole Prefect deployment CICD set up into yaml if possible

Nate

11/20/2024, 10:33 PM

sorry i forgot you're using python interface. hrm usually people use

from_source

and point it at a git repo or bake their source into an image sounds like all your code is on something like a VM and you have a process worker going there?

Austin Weisgrau

11/20/2024, 10:34 PM

thats right

Austin Weisgrau

11/20/2024, 10:34 PM

well it is in an image

Austin Weisgrau

11/20/2024, 10:34 PM

but the worker is running in that image (in an AWS ECS service)

Austin Weisgrau

11/20/2024, 10:35 PM

basically following the original tutorial showing how to set up prefect 2 in AWS ECS: https://github.com/anna-geller/dataflow-ops

Austin Weisgrau

11/20/2024, 10:35 PM

trying to migrate this setup into prefect 3

Nate

11/20/2024, 10:51 PM

hmm I'm not sure im following this

well it is in an image

but the worker is running in that image (in an AWS ECS service)

are you saying your flow code is stored in the same image as the one used to run

prefect worker start

in an ECS service? but yeah unfortunately that guide is an early 2.x agent / infra blocks setup with a couple conceptual differences from workers and work pools, where i'd refer again to those docs I linked above or the video I linked where I go over how to use

from_source

and

deploy

Austin Weisgrau

11/20/2024, 10:51 PM

yes exactly

Austin Weisgrau

11/20/2024, 10:51 PM

the flow code is stored on the same image as the one used to run the worker

Austin Weisgrau

11/20/2024, 10:51 PM

in an ECS service

Austin Weisgrau

11/20/2024, 10:54 PM

I think the closest paradigm in Prefect 3 is the push work pool with manually configured infrastructure, but the lines about how to configure the deployment just say to name that work pool, and don't cover how to deal with this code storage question

Austin Weisgrau

11/20/2024, 10:54 PM

https://docs.prefect.io/v3/deploy/infrastructure-examples/serverless

Nate

11/20/2024, 10:56 PM

I see. my recommendation would be to move to storing code in github or s3 and not in that image yep then I would propose one of the following • use the ECS push pool like you just linked • run a docker work pool someplace with systemd like I discuss in the video > don't cover how to deal with this code storage question correct, we encourage a 3rd party code storage location like github, s3 or in the runtime image (for containerized type work pools)

Austin Weisgrau

11/20/2024, 10:57 PM

That seems like that would add unnecessary latency and extra network costs to every run for downloading the storage from somewhere else. Is it impossible in Prefect 3 to allow the flow code to be local to the worker itself?

Nate

11/20/2024, 10:58 PM

no its not. you can use the

set_working_directory

pull step I showed earlier, either in yaml or by updating the deployment objects pull steps with the client directly the reason this is not a common pattern is because presumably you have to redeploy your worker every time your code changes, bc the code is in the workers image?

Austin Weisgrau

11/20/2024, 10:59 PM

that's almost correct, but the worker actually spawns a new ECS task based on the same task definition (and docker image) for each run. so if the image updates with updated code, when a new flow run is spawned it will have that updated code

Austin Weisgrau

11/20/2024, 11:00 PM

so all we need to do is redeploy the docker image on code updates

Austin Weisgrau

11/20/2024, 11:00 PM

can you tell me a little more explicitly how I can modify the "set_working_directory" pull step using the python API?

Austin Weisgrau

11/20/2024, 11:03 PM

I've looked and don't see a way to modify those options on flow.deploy() or flow.to_deployment() or on the RunnerDeployment object or in RunnerDeployment.apply(), but it sounds like I'm looking in the wrong place or missing it

Nate

11/20/2024, 11:13 PM

so I think if you're looking keep all your code in a long lived container and run the code as subprocess I would follow this guide you'd be fighting the design of workers / work pools a lot less (inherently for dynamic dispatch of infra), because

serve(*[<http://f.to|f.to>_deployment(...) for f in list___of_flows])

is going to effectively do the following • start a process that listens for scheduled runs of each deployment • run the flow run is a subprocess when it finds a scheduled run so exactly the same thing as a process worker, except you don't have to engage with work pools / workers / pull steps etc

Austin Weisgrau

11/20/2024, 11:14 PM

We actually really do rely on the AWS ECS worker spinning up a new ECS task for each flow run - that enables the infra to be fully auto-scaling. we can't actually run all the flows in the same container

Nate

11/20/2024, 11:15 PM

sorry im not sure I understand your setup then, I thought you said you were running a process worker as an ECS service

Austin Weisgrau

11/20/2024, 11:17 PM

so this was possible in Prefect 2 and maybe it's not in Prefect 3, but • we have a prefect worker running in a long-lived AWS ECS service. The image that defines this ECS task includes all our flow codes on the filesystem. When Prefect Cloud triggers a flow run, this worker gets the signal and spins up a new ECS Task based on the same task definition and docker image to run the flow.

Austin Weisgrau

11/20/2024, 11:18 PM

the push pool concept seems very similar but maybe actually skips having a worker running at all? so Prefect Cloud itself maybe actually needs direct credentials to spin up ECS tasks?

Nate

11/20/2024, 11:18 PM

correct on the second message

Nate

11/20/2024, 11:19 PM

and i assume with this > spins up a new ECS Task you were probably doing something like

ECS(...).run()

and using the infra block class without using it as a "prefect depoyment"? and just calling that from the flows that spun up as subprocesses of your process worker

Austin Weisgrau

11/20/2024, 11:20 PM

that's not right - all our flows are published as deployments in prefect cloud, with an prefect-aws.ecs.ECSTask infrastructure block

Nate

11/20/2024, 11:21 PM

prefect worker running in a long-lived AWS ECS service

do you know the command that you use to start this process?

Austin Weisgrau

11/20/2024, 11:22 PM

prefect worker start -q dataflowops

Nate

11/20/2024, 11:23 PM

im not sure thats right,

prefect worker start

requires a work pool

-p

, whereas

prefect agent start

can be started with only a reference to a queue

-q

Austin Weisgrau

11/20/2024, 11:26 PM

hm you're right, it looks like we are still using a prefect agent on

prefect agent start -q dataflowops

Nate

11/20/2024, 11:31 PM

yep. so I think this docs section and this

video▾

should hopefully give some color on the differences between workers and agents. going to sign out, but feel free to open a discussion around redundant code fetching if it still ends up being a problem for you

Austin Weisgrau

11/20/2024, 11:34 PM

Thanks for your help - would you mind indicating where I can modify the "set_working_directory" pull step using the python API that you mentioned? I do think I'm going to need to do this one way or another

Austin Weisgrau

11/20/2024, 11:35 PM

the prefect-aws ecs guide seems to include this in the yaml config description but again, I'd like to keep things in python if at all possible https://prefecthq.github.io/prefect-aws/ecs_guide/

Nate

11/21/2024, 12:25 AM

I think we'll need to update

DeploymentUpdate

both client and server side to allow that, because right now

pull_steps

is not a field on that schema

Nate

11/21/2024, 12:26 AM

https://github.com/prefecthq/prefect/blob/main/src/prefect/server/schemas/actions.py#L238 so even if you raw hit the REST endpoint, server side validation will fail with a 422 because of the server side deployment schema

Austin Weisgrau

11/21/2024, 8:25 PM

hm ok so just to make sure I understand, it sounds like we won't be able to migrate to Prefect 3 with the architecture that's working for us on Prefect 2, and instead we'll have to change at least one of the following • using the Python API for deployments must be replaced with using yaml for deployments, to enable us to modify the pull steps • flow code can't be kept local to the worker, but must be stored remotely and retrieved as part of flow runs

Nate

11/21/2024, 8:36 PM

• using the Python API for deployments must be replaced with using yaml for deployments, to enable us to modify the pull steps

DeploymentUpdate

can be updated via a simple PR to allow patching

pull_steps

, it wouldnt be a hard change

• flow code can't be kept local to the worker, but must be stored remotely and retrieved as part of flow runs

in general this is not true, this is what

.serve

is for, where you can always use

run_deployment

to trigger work on other infrastructures like ECS

27 Views

Open in Slack

Previous Next