Nick
09/18/2024, 11:39 AMJack Goslin
09/18/2024, 1:38 PMJack Goslin
09/18/2024, 1:40 PMNate
09/18/2024, 2:10 PMNick
09/18/2024, 2:49 PMNate
09/18/2024, 3:14 PMprefect.yaml
while those who maybe have to write complex logic to gather what they need to make deplyoments can use .deploy
i will also note that our steps like prefect.deployment.steps.*
are just fully qualified function names, so you're free to write python to do arbitrary deployment setup like this (used like this)Jack Goslin
09/18/2024, 3:54 PM.serve()
, .deploy()
, .to_deployment()
, prefect deploy apply
. Some examples even show creating a flow and running it, but not serving or deploying. What does that even do? Does it schedule it? Is it in the UI? 🤷 The docs show how to use .serve()
, which doesn't create a deployment, so it's useless since I'm using Prefect as a UI'd cron.
It says don't use process workers in prod but continues to show how to use it in examples. I feel like half the docs tell you to do it one way and the other half contradicts that. I think this has changed now, but it would show you how to use agents, then say they're deprecated, then use them in examples. It really needs to show, in one page, the infrastructure details being used in the example, how the worker is set up, how the deployment is created, how the flows and tasks execute, and how it should look in the UI. Instead, everything is spread out across the entirety of the documentation that links to itself.
The biggest headache with Docker is that there's no option to have it remove the container after the flow is complete, so it quickly fills up the storage space on the infrastructure it runs on, requiring me to make cron jobs that clean up the Docker containers. Eventually I won't be able to remove them fast enough and I'll need to build more servers. As you add more deployments it fills it faster and you have to run the cleanup even more frequently. I have been told to use serverless architecture. That was my plan after learning Docker deployments, which is simpler to start and learn with. I don't think I'll be using K8s any time soon due to how much of a hassle Docker was. The docs are also not clear on where the working directories are - on the worker machine or in the container.
The YAML file prefect init
generates has missing data in it so when you go to edit and deploy it, it throws validation errors that don't explain what is failing. prefect deploy
walks you through setting up a deployment but really only fills out a few places in the YAML and the rest is empty and will error. And, when you tell it you're using Git and you give it the PAT, it creates a new secret block instead of asking you to use an existing block. So now I have like 5 different Github keys it saved. I deleted probably 10 lot of them but they were all named something like deployment-xyz-docker-deployment-xyz-othername-repo-token
. I was scared to delete them because it would break the deployment, but some of those were left over from my failures to get the deployment to work, and you can't edit the deployment in the UI. You have to delete the deployment and redo it but the secret blocks are still saved and you have no idea of knowing which one is actually the one in use.
It asks if you're pulling from a Git repo and doesn't actually do anything with it. It tries to find your flow code in that deployment wizard (and finds a bunch of test flows in the venv I didn't create) but my flow code is in the Git repo it doesn't actually pull, so it says it can't find the flow code, like, brother I know you can't find it that's why you should be pulling the thing I said to pull first before you look for it. It all made me want to build my own scripts independent of Prefect that would do this. If you pull the repo first and then apply the YAML, it will see the existing repo and instead of just running pull on it, it clones the repo in the same directory and appends the branch name. I pulled dbt_core
. It clones the same repo into dbt_core-main
, but the deployment is looking for the flow in dbt_core
. That means it pulls into dbt_core-main
and then executes the flow in dbt_core
because that's the flow code I had when I ran prefect deploy
otherwise it would error and say it can't find the flow. That was by far the most frustrating part of all this.
Then you have the issue where you're trying to find implementation details in the API reference, and all the code is YAML??? I've never seen documentation that shows you Python objects in YAML. I have to dig to find the Python docs and they sometimes don't match the other page that has the YAML. This may have changed because I can't find examples now.
Me: how do I use an agent
Docs: don't use agents, use workers
Me: what's a worker
Docs: it's like an agent, but you need to understand work pools first
Me: what's a work pool
Docs: you should learn what a work queue is first
Me: what's a work queue
Docs: well you don't really need to use them, but here's an example of how to do a process worker
Me: what's a process worker
Docs: don't use them. Here's an example that shows how to run a flow but not serve it and it doesn't use a worker.
Me: how do I deploy something
Docs: serving is easier so do that first
Me: how do I serve a flow on a schedule
Docs: if you want to schedule something you need to use a deployment
Me: the whole reason I'm here is to schedule things
Docs: to deploy you need a worker, here's an example of a process worker (don't use it), but you should really just use serve. Here are 50 examples of how to serve a flow that's in the same script that's doing the serving.
Me: I just want to run a Python script on a schedule on an EC2 instance 😭
Docs: here's how to use YAML to deploy using Kubernetes
Finally, there are waaay too many methods to do the same thing and they're all shown in the examples. Do I use GitHubCredentials
or do I load a Block? Some examples show using a superclass, some show a subclass. Am I using AwsCredentials.get_client('s3')
, or AwsCredentials.get_s3_client()
? Should I use DbtCoreOperation().run('dbt test)
or prefect_dbt.cli.commands.run_dbt_test()
? The first one you can put in a task you created and can name, but run_dbt_test()
creates its own task that you can't name.Nate
09/18/2024, 4:03 PMprefect deployment apply
and all that is removed in 3.x, intentionally to make the deployment interface easier to understand.
◦ static infra? -> use serve
◦ dynamically dispatched infra? -> use workers and work pools (.deploy or prefect deploy
)
• serve does create a deployment, it's just that the infra is the machine that's running the script, instead of some dynamically dispatched infra common among users of containerized runtimes
> The docs show how to use .serve()
, which doesn't create a deployment
◦ the reason serve is featured is because its the easiest way to use prefect as a "UI'd cron" and/or play with deployment features like the flow run param form or event triggersAlexander Azzam
09/18/2024, 4:09 PM