In Prefect 2, you could use an object-oriented app...
# ask-community
a
In Prefect 2, you could use an object-oriented approach to handling deployments. with the Deployment class you could .apply() or .delete() a deployment. In Prefect 3, it seems like you can only run a Flow.deploy() method to push a deployment to Prefect Cloud, but there's no object-oriented interface. the Deployment class is removed, and you have to implement one yourself if you want that kind of interface. Am I wrong here? This seems like a big regression.
n
flow.from_source(..).deploy(..)
is the analogue to
Deployment.build_from_flow(...).apply(...)
which creates the deployment on the server are you saying its harder to delete deployments now? there's a client method
delete_deployment
otherwise can you explain whats harder to do now?
a
oh nice that client method is helpful
n
yeah the delete method on
Deployment
was just basically a passthrough to that method
a
yeah just in general in prefect 2 we are using a very object-oriented approach in CICD to compare deployments in our codebase vs in the cloud, so we could have a declarative approach to maintaining our cloud deployments with the exclusively functional API in prefect 3, we'll basically have to write the whole object layer ourselves that prefect 2 provided out of the box
n
how do you mean this?
exclusively functional API in prefect 3
in general we have 2 interfaces for creating deployments (since mid prefect 2.x when we switched to workers and deprecated
build_from_flow
) • prefect.yaml (examples) •
.deploy
(examples)
a
prefect.yaml offers a declarative approach yes, but in prefect 2 we had an in-python declarative approach using the Deployment class
n
that link i shared above should show how to map between these methods
flow.from_source(..).deploy(..)
is the analogue to
Deployment.build_from_flow(...).apply(...)
the
Deployment
class is coupled to agents, which have been superseded by workers over the last couple years. the kwargs you passed into
build_from_flow
can generally be passed to
.deploy
(with the exception of infra/storage blocks, which are superseded by work pools and
from_source
/ pull steps respectively)
k
IDK if this is the “object-oriented” approach that you’re looking for but we use the
RunnerDeployment
object along with the
deploy
method. Code example: `flows.py`:
Copy code
# Flow code omitted

def create_deployments() -> list[RunnerDeployment]:
    foo_deployment = RunnerDeployment.from_entrypoint(
        "flows:foo",
        name="foo",
    )
    return [foo_deployment]
`ci_deploy.py`:
Copy code
from prefect import deploy

from flows import create_deployments

    private_deployment_ids = deploy(
        *create_deployments(),
        work_pool_name=settings.prefect_private_work_pool_name,
        image=os.environ["DOCKER_IMAGE"],
        build=False,
        push=False,
        print_next_steps_message=True,
    )
    <http://logger.info|logger.info>("Private deployments: {}", private_deployment_ids)
It’s a bit rough but I hope it helps to understand the basic idea
🙏 1
a
ahh word RunnerDeployment might be what I want
thank you!
k
Basically the way it works is that we have each flow file define its own deployments, then you import those in the common
ci_deploy.py
file which can be run in CI
👍 1
a
Hm I'm having a problem here where the RunnerDeployment infers the
path
attribute from the local filesystem, and contrary to the documentation it can't be overwritten by passing
path
as an initialization variable
So even if the
entrypoint
is correctly specified, the deployment will still try and find the path relative to the absolute path of the filesystem where
deployment.apply()
was run, which won't work if the local filesystem is different than the production filesystem
n
sorry to repeat myself, but I would recommend using
from_source
and
.deploy()
and if you really need a class, you could pretty easily make a class that uses these methods under the hood I wouldn't recommend using
RunnerDeployment
directly because if you look at the implementation of
from_source
and
deploy
you'll see we handle a lot of the cases associated with different storage options there, so you'd be responsible for re-implementing that if you used
RunnerDeployment
directly post-agents, you shouldnt have to set a
path
. like these examples show, if you're using the pythonic interface, you set
source
to the repo or bucket or whatever you have, and then you set
entrypoint
relative to that source

this

might be a good resource if you haven't spent time using workers yet
a
we're using a worker that runs in a persistent AWS ECS service in a docker image built on our flows repo, so we don't actually need code storage defined at all really
already been using a worker for a long time
n
gotcha, then yeah you don't need from_source, but instead to set an image like this. which you can build/push during .deploy or not via
build
push
kwargs
a
I feel confused about why I need to specify an image or a remote storage location. The worker is already on a docker image with the flows available on the local filesystem.
The flows are always already implicitly available to the worker - they don't need to be downloaded from somewhere else
In our Prefect 2 worker set ups, this works without needing to pull in remote storage
n
there's a step for that situation, which is this one so you don't have to pull the code, but just set the working directory where your flow code is
a
ah is there an equivalent way to do this in the python API? I'm experimenting with flow.deply() right now
I'd prefer not to need to migrate our whole Prefect deployment CICD set up into yaml if possible
n
sorry i forgot you're using python interface. hrm usually people use
from_source
and point it at a git repo or bake their source into an image sounds like all your code is on something like a VM and you have a process worker going there?
a
thats right
well it is in an image
but the worker is running in that image (in an AWS ECS service)
basically following the original tutorial showing how to set up prefect 2 in AWS ECS: https://github.com/anna-geller/dataflow-ops
trying to migrate this setup into prefect 3
n
hmm I'm not sure im following this
well it is in an image
but the worker is running in that image (in an AWS ECS service)
are you saying your flow code is stored in the same image as the one used to run
prefect worker start
in an ECS service? but yeah unfortunately that guide is an early 2.x agent / infra blocks setup with a couple conceptual differences from workers and work pools, where i'd refer again to those docs I linked above or the video I linked where I go over how to use
from_source
and
deploy
a
yes exactly
the flow code is stored on the same image as the one used to run the worker
in an ECS service
I think the closest paradigm in Prefect 3 is the push work pool with manually configured infrastructure, but the lines about how to configure the deployment just say to name that work pool, and don't cover how to deal with this code storage question
n
I see. my recommendation would be to move to storing code in github or s3 and not in that image yep then I would propose one of the following • use the ECS push pool like you just linked • run a docker work pool someplace with systemd like I discuss in the video > don't cover how to deal with this code storage question correct, we encourage a 3rd party code storage location like github, s3 or in the runtime image (for containerized type work pools)
a
That seems like that would add unnecessary latency and extra network costs to every run for downloading the storage from somewhere else. Is it impossible in Prefect 3 to allow the flow code to be local to the worker itself?
n
no its not. you can use the
set_working_directory
pull step I showed earlier, either in yaml or by updating the deployment objects pull steps with the client directly the reason this is not a common pattern is because presumably you have to redeploy your worker every time your code changes, bc the code is in the workers image?
a
that's almost correct, but the worker actually spawns a new ECS task based on the same task definition (and docker image) for each run. so if the image updates with updated code, when a new flow run is spawned it will have that updated code
so all we need to do is redeploy the docker image on code updates
can you tell me a little more explicitly how I can modify the "set_working_directory" pull step using the python API?
I've looked and don't see a way to modify those options on flow.deploy() or flow.to_deployment() or on the RunnerDeployment object or in RunnerDeployment.apply(), but it sounds like I'm looking in the wrong place or missing it
n
so I think if you're looking keep all your code in a long lived container and run the code as subprocess I would follow this guide you'd be fighting the design of workers / work pools a lot less (inherently for dynamic dispatch of infra), because
serve(*[<http://f.to|f.to>_deployment(...) for f in list___of_flows])
is going to effectively do the following • start a process that listens for scheduled runs of each deployment • run the flow run is a subprocess when it finds a scheduled run so exactly the same thing as a process worker, except you don't have to engage with work pools / workers / pull steps etc
a
We actually really do rely on the AWS ECS worker spinning up a new ECS task for each flow run - that enables the infra to be fully auto-scaling. we can't actually run all the flows in the same container
n
sorry im not sure I understand your setup then, I thought you said you were running a process worker as an ECS service
a
so this was possible in Prefect 2 and maybe it's not in Prefect 3, but • we have a prefect worker running in a long-lived AWS ECS service. The image that defines this ECS task includes all our flow codes on the filesystem. When Prefect Cloud triggers a flow run, this worker gets the signal and spins up a new ECS Task based on the same task definition and docker image to run the flow.
the push pool concept seems very similar but maybe actually skips having a worker running at all? so Prefect Cloud itself maybe actually needs direct credentials to spin up ECS tasks?
n
correct on the second message
and i assume with this > spins up a new ECS Task you were probably doing something like
ECS(...).run()
and using the infra block class without using it as a "prefect depoyment"? and just calling that from the flows that spun up as subprocesses of your process worker
a
that's not right - all our flows are published as deployments in prefect cloud, with an prefect-aws.ecs.ECSTask infrastructure block
n
prefect worker running in a long-lived AWS ECS service
do you know the command that you use to start this process?
a
prefect worker start -q dataflowops
n
im not sure thats right,
prefect worker start
requires a work pool
-p
, whereas
prefect agent start
can be started with only a reference to a queue
-q
a
hm you're right, it looks like we are still using a prefect agent on
prefect agent start -q dataflowops
n
yep. so I think this docs section and this

video

should hopefully give some color on the differences between workers and agents. going to sign out, but feel free to open a discussion around redundant code fetching if it still ends up being a problem for you
a
Thanks for your help - would you mind indicating where I can modify the "set_working_directory" pull step using the python API that you mentioned? I do think I'm going to need to do this one way or another
the prefect-aws ecs guide seems to include this in the yaml config description but again, I'd like to keep things in python if at all possible https://prefecthq.github.io/prefect-aws/ecs_guide/
n
I think we'll need to update
DeploymentUpdate
both client and server side to allow that, because right now
pull_steps
is not a field on that schema
https://github.com/prefecthq/prefect/blob/main/src/prefect/server/schemas/actions.py#L238 so even if you raw hit the REST endpoint, server side validation will fail with a 422 because of the server side deployment schema
a
hm ok so just to make sure I understand, it sounds like we won't be able to migrate to Prefect 3 with the architecture that's working for us on Prefect 2, and instead we'll have to change at least one of the following • using the Python API for deployments must be replaced with using yaml for deployments, to enable us to modify the pull steps • flow code can't be kept local to the worker, but must be stored remotely and retrieved as part of flow runs
n
• using the Python API for deployments must be replaced with using yaml for deployments, to enable us to modify the pull steps
DeploymentUpdate
can be updated via a simple PR to allow patching
pull_steps
, it wouldnt be a hard change
• flow code can't be kept local to the worker, but must be stored remotely and retrieved as part of flow runs
in general this is not true, this is what
.serve
is for, where you can always use
run_deployment
to trigger work on other infrastructures like ECS