:wave: Hello peops, Currently there are 2 approac...
# ask-community
n
👋 Hello peops, Currently there are 2 approaches to deploy the flows: 1. Python SDK 2. YAML I would like to confirm if there is a preferred deployment approach? My concern is that one of these approaches will stop being supported and in a few years we will have a time consuming migration on hands. I couldn't find anything in the docs, GitHub, community forums, Google. Hence asking here 🙏
j
I had this same exact question and both ways seem to have their caveats and expectations that aren't well documented. It reaaaaally wants you to use Docker or some other container like K8s, which may be overkill for what you're wanting to do. I have two workers set up on EC2 instances: one that spins up Docker containers and the other that's just a process worker that runs shell commands (dbt). What ended up being easiest for me was to do it in a separate Python script. With the YAML file you have to remember the commands to do it, some are deprecated or unintuitive, and you'll get confusing errors if it's not exactly perfect. It'll generate incomplete YAML files and fail to apply. You might get lucky like me and get an error that says "got None, expected None" 🙃 Here's an example of how I deploy to a process worker, but Docker is very similar. I'm still doing those in YAML just because I'm not fully switched over to using scripts and I'm terrified to touch Prefect deployments at all for fear it'll all fall apart (it has).
For any Prefect folks, the examples in the documentation are mostly not helpful because they demonstrate super basic non-prod ideas, like pulling from a local Git repo or not even using any remote storage. The examples that do show this are all over the place and I felt like none of them were close enough to what I was trying to do. I found myself bouncing around all the different doc pages until I had 15 tabs open and was completely lost.
n
hi @Nick and @Jack Goslin - thanks for the feedback! could i ask that you share what specifically didn’t work for you when you tried it? we will continue to try to improve docs, any concrete suggestions on that would be appreciated we plan to support yaml / .deploy indefinitely
n
From our perspective everything worked with both options. We have several hundred of flows/deployments running in our k8s env. My question was from long-term planning case, since as I mentioned that we didn't want to end up in a migration situation in a year or so. It is not explicit what method is preferred in the docs and whether both will be supported. Hence my question. Thanks for clarifying! Re Python SDK - it would be great to have an example of a real world CI/CD deployment pipeline i.e. how to use this method to deploy flows from different files in the repo that contains hundre of flows. We have a hacky setup for this which I can talk about (but I want to avoid overloading the thread with details for now). Deploying the yaml option is much easier from this perspective.
n
> Re Python SDK - it would be great to have an example of a real world CI/CD deployment pipeline i.e. how to use this method to deploy flows from different files in the repo that contains hundre of flows this is a great suggestion! I can add an issue for us to include an example like this in the docs, since its a common ask and yeah i think you got it already, one method isn't necessarily preferred in general, they serve different preferences. operators who are used to yaml may like
prefect.yaml
while those who maybe have to write complex logic to gather what they need to make deplyoments can use .deploy i will also note that our steps like
prefect.deployment.steps.*
are just fully qualified function names, so you're free to write python to do arbitrary deployment setup like this (used like this)
👍 1
j
Bear in mind that I am not familiar with any other orchestration software, and my server management and AWS skills are rudimentary. Prefect was by far the hardest software I've ever had to learn. I think I'm in the extreme minority though. If I knew how time consuming it would be to go down this path I would have chosen something way simpler than Prefect. Again, probably a skill issue, and I went through all of this about 10 months ago. This may not all be fair criticisms at this point but it's what goes through my head when I have to do a deployment. The examples are either too simple or too complex for what I'm trying to do. A big issue for me was the differences between
.serve()
,
.deploy()
,
.to_deployment()
,
prefect deploy apply
. Some examples even show creating a flow and running it, but not serving or deploying. What does that even do? Does it schedule it? Is it in the UI? 🤷 The docs show how to use
.serve()
, which doesn't create a deployment, so it's useless since I'm using Prefect as a UI'd cron. It says don't use process workers in prod but continues to show how to use it in examples. I feel like half the docs tell you to do it one way and the other half contradicts that. I think this has changed now, but it would show you how to use agents, then say they're deprecated, then use them in examples. It really needs to show, in one page, the infrastructure details being used in the example, how the worker is set up, how the deployment is created, how the flows and tasks execute, and how it should look in the UI. Instead, everything is spread out across the entirety of the documentation that links to itself. The biggest headache with Docker is that there's no option to have it remove the container after the flow is complete, so it quickly fills up the storage space on the infrastructure it runs on, requiring me to make cron jobs that clean up the Docker containers. Eventually I won't be able to remove them fast enough and I'll need to build more servers. As you add more deployments it fills it faster and you have to run the cleanup even more frequently. I have been told to use serverless architecture. That was my plan after learning Docker deployments, which is simpler to start and learn with. I don't think I'll be using K8s any time soon due to how much of a hassle Docker was. The docs are also not clear on where the working directories are - on the worker machine or in the container. The YAML file
prefect init
generates has missing data in it so when you go to edit and deploy it, it throws validation errors that don't explain what is failing.
prefect deploy
walks you through setting up a deployment but really only fills out a few places in the YAML and the rest is empty and will error. And, when you tell it you're using Git and you give it the PAT, it creates a new secret block instead of asking you to use an existing block. So now I have like 5 different Github keys it saved. I deleted probably 10 lot of them but they were all named something like
deployment-xyz-docker-deployment-xyz-othername-repo-token
. I was scared to delete them because it would break the deployment, but some of those were left over from my failures to get the deployment to work, and you can't edit the deployment in the UI. You have to delete the deployment and redo it but the secret blocks are still saved and you have no idea of knowing which one is actually the one in use. It asks if you're pulling from a Git repo and doesn't actually do anything with it. It tries to find your flow code in that deployment wizard (and finds a bunch of test flows in the venv I didn't create) but my flow code is in the Git repo it doesn't actually pull, so it says it can't find the flow code, like, brother I know you can't find it that's why you should be pulling the thing I said to pull first before you look for it. It all made me want to build my own scripts independent of Prefect that would do this. If you pull the repo first and then apply the YAML, it will see the existing repo and instead of just running pull on it, it clones the repo in the same directory and appends the branch name. I pulled
dbt_core
. It clones the same repo into
dbt_core-main
, but the deployment is looking for the flow in
dbt_core
. That means it pulls into
dbt_core-main
and then executes the flow in
dbt_core
because that's the flow code I had when I ran
prefect deploy
otherwise it would error and say it can't find the flow. That was by far the most frustrating part of all this. Then you have the issue where you're trying to find implementation details in the API reference, and all the code is YAML??? I've never seen documentation that shows you Python objects in YAML. I have to dig to find the Python docs and they sometimes don't match the other page that has the YAML. This may have changed because I can't find examples now. Me: how do I use an agent Docs: don't use agents, use workers Me: what's a worker Docs: it's like an agent, but you need to understand work pools first Me: what's a work pool Docs: you should learn what a work queue is first Me: what's a work queue Docs: well you don't really need to use them, but here's an example of how to do a process worker Me: what's a process worker Docs: don't use them. Here's an example that shows how to run a flow but not serve it and it doesn't use a worker. Me: how do I deploy something Docs: serving is easier so do that first Me: how do I serve a flow on a schedule Docs: if you want to schedule something you need to use a deployment Me: the whole reason I'm here is to schedule things Docs: to deploy you need a worker, here's an example of a process worker (don't use it), but you should really just use serve. Here are 50 examples of how to serve a flow that's in the same script that's doing the serving. Me: I just want to run a Python script on a schedule on an EC2 instance 😭 Docs: here's how to use YAML to deploy using Kubernetes Finally, there are waaay too many methods to do the same thing and they're all shown in the examples. Do I use
GitHubCredentials
or do I load a Block? Some examples show using a superclass, some show a subclass. Am I using
AwsCredentials.get_client('s3')
, or
AwsCredentials.get_s3_client()
? Should I use
DbtCoreOperation().run('dbt test)
or
prefect_dbt.cli.commands.run_dbt_test()
? The first one you can put in a task you created and can name, but
run_dbt_test()
creates its own task that you can't name.
n
thanks for the feedback @Jack Goslin feel free to codify these information architecture thoughts in a discussion! I'm sure some of the frictions you've felt are not unique to you just to clarify a couple points • agents and
prefect deployment apply
and all that is removed in 3.x, intentionally to make the deployment interface easier to understand. ◦ static infra? -> use serve ◦ dynamically dispatched infra? -> use workers and work pools (.deploy or
prefect deploy
) • serve does create a deployment, it's just that the infra is the machine that's running the script, instead of some dynamically dispatched infra common among users of containerized runtimes > The docs show how to use
.serve()
, which doesn't create a deployment ◦ the reason serve is featured is because its the easiest way to use prefect as a "UI'd cron" and/or play with deployment features like the flow run param form or event triggers
a
@Jack Goslin dm'd