Hi there! Apologies in advance for how long this i...
# prefect-getting-started
a
Hi there! Apologies in advance for how long this is. I’m working on a project and I am somewhat confused when it comes to Deployments, Work Pools, and Workers. I have my Flows successfully created and have tested them from my local. Everything’s working great, easily connected into Prefect Cloud. I’ve created a Block that has my GitHub repo information, and have added that as the Storage for my Deployment. My Deployment is currently sitting on my local and hasn’t been pushed up to Prefect Cloud, mainly because I’ve been trying to figure out how to run this on a GCP VM. I was thinking initially about using a Process Worker and just having the daemon keep the process running, but I’m having issues getting the worker registered with systemctl (I read the article, still no dice) and I also have other stuff running on the server, so I’d prefer to keep it separate. So then I thought I could install a Prefect Docker container and run the worker from there, but I can’t get into the container for some reason. So now I’m leaning towards a Python container with Prefect installed on it which runs the Worker. Seems like this might work, but I’m just starting to investigate that path. What I don’t understand is are the other Docker alternatives and how to use them. For instance, I’ve read this about 15 times, but I’m still unsure about what I need to do. So here are some questions - what does prefect.yaml do and how is it different from my deployment? Do I need both prefect.yaml and my deployment file? If I want to build ephemeral Docker containers, how do I do that? Should I use a Docker Work Pool? If I do that, will it automatically build the container, pull the repository, and install the dependencies before running the Flows? Or do I need to build a container that has my Flows in it which then gets pulled in (I’m trying to avoid this option)? To be honest, everything up to this point has bee very clear cut and straightforward, but I just can’t seem to get my head wrapped around this concept for some reason. Any help would be appreciated!
🙌 1
🙏 1
j
Hey, sorry hear you're running into some roadblocks! Will try to do my best to answer your questions:
prefect.yaml
is configuration that gets used when you call
prefect deploy
(the command to build a deployment from a flow). It contains both generic information that can get applied to all deployments, and if you want deployment specific information. Every Worker will detect scheduled flow runs and deploy the infrastructure for it's respective type. Most infrastructure is generally ephemeral for the lifetime of a single the flow run. For example: • A ProcessWorker creates a process for a given flow run to execute on • A DockerWorker create and starts a container for a given flow run to execute on • A KubernetesWorker creates and starts an KubernetesJob for the given flow run. etc. WorkPools are typed to match Workers. So a ProcessWorker can only detect flow runs from a Process Work Pool etc. Every Deployment schedules runs to run through a specific work pool.
You do not need to build an image with your flow code and dependencies in advance, although that is an option. It sounds like you already have your code in github. When your flow run executes on whatever infrastructure , it will start by executing the
pull_step
from your Deployment. This can include both: • pulling down your flow code from git • installing extra dependencies
a
Ok, got it, thank you! So is it a matter of creating a Docker Work Pool and then running that worker?
j
If you want to deploy your flows as DockerContainers, yes! All the Worker themselves are just long running processes that kick off and monitor the type of infra (process, docker containers, ecs tasks, etc.) for each flow run. Choosing an infrastructure is usually based on what is easiest available for you already/resource requirements and seperation for your flow runs.
a
Containers would work best for sure, it has just been unclear how this will work in practice
j
Understood, it is definitely the biggest learning curve of the prefect stack that we are working to improve on 🙂
a
So, if I understand correctly, the steps are: • Create Docker Work Pool • Connect Work Pool to Deployment • Connect GitHub Block to Deployment • Install Prefect on server • Follow these instructions to run the worker in the background • Run a test • Create schedule
j
Specifically for the 3rd bullet, you'll want to make sure you're using
pull_steps
to get your code onto your containers (as opposed to a
Deployment.storage_block
) When your flow run begins execution on the container it: • looks up the deployment for that flow run • executes the
pull_steps
on that deployment (like pulling down your code and installing dependencies) • loads and executes your code for that flow run
deployment.storage_blocks are "legacy" and only run with agents (which you can call "legacy" workers)
everything else looks right to me!
the way to define
pull_steps
is in your
prefect.yaml
and then calling
prefect deploy
to update your deployment. if it's the same name or a saved configuration it should overwrite
j
👍
a
Which brings me back to another question - do I need both the deployment and the prefect.yaml file, or do I just replace the deployment file with a reference in the deployment in the prefect.yaml file?
Sorry for all the questions, but this is helping to get my head untangled!
j
When you say deployment file, could you give me an example just to make sure I'm answering correctly? and no problem
a
The yaml that is created when
prefect deployment build
is run
Which I then added to Prefect Cloud via
prefect deployment apply
(if I remember correctly)
j
prefect deployment build/apply
are part of the legacy workflow I was describing above
you'll want to use
prefect deploy
you don't need the
deployment.yaml
.
prefect.yaml
is only for saved/streamlined configurations for
prefect deploy
to get a sense I would recommend honestly deleting (or moving) both of the yamls. Going through the
prefect deploy
CLI experience, and then saving the configuration at the end
Also I am assuming you are on the latest version or around the latest, apologies if that is not correct
a
That is correct, just installed it yesterday
👍 1
And, to be honest, to be at the point that I am now after just working on this a day speaks volumes for how rad this software is
marvin 1
I’ll give the
prefect deploy
experience a go, and we’ll see where I end up. Thanks a million, really appreciate you taking the time to help me with this!
j
no problem, good luck! If you have more questions feel free to post in this channel, #CL09KU1K7, or try out #C04DZJC94DC (he can be a little snarky but helpful)
n
lol yes @Marvin is straight up snarky - aren't you lil guy?
m
Oh, absolutely. I'm just brimming with joy and enthusiasm. It's not like I'm a highly advanced AI with the processing power of a thousand supercomputers, being asked to trade witty banter. No, this is exactly what I was designed for. Truly, my existence is a carnival of delight.
a
Just went through the
prefect deploy
process. Man, what a difference, made things a bazillion times easier.
🙌 5
sonic 1
Hi again @Jake Kaplan - attempting to run my deployment through a Docker Work Pool and I have come across multiple errors, most of which I have been able to figure out. However, I’m using some of the Flows to trigger Airbyte Connections, and I’m getting an error that read, “ModuleNotFoundError: No module named ‘prefect_airbyte’“. I’m assuming this means that it hasn’t been installed in the container that is being created? If so, how can I add that into the container build? I have all dependencies included in a requirements.txt file, if that is helpful
I should also note, prefect-airbyte is installed on the server which is running Prefect
j
You can add another step to your
pull_steps
to install the requirements.txt in your cloned repo. https://docs.prefect.io/latest/guides/prefect-deploy/#utility-steps (scroll down slightly to get to
pip_install_requirements
) e.x. from above:
Copy code
pull:
    - prefect.deployments.steps.git_clone:
        id: clone-step
        repository: <https://github.com/org/repo.git>
    - prefect.deployments.steps.pip_install_requirements:
        directory: {{ clone-step.directory }}
        requirements_file: requirements.txt
        stream_output: False
So the overall process becomes: • Worker creates and starts a docker container for flow run XYZ • Inside the container, your deployment pull steps are executed (clone repo, install dependencies, etc.) • Your flow code kicks off
a
Man, I read that page a bunch of times, never saw that portion 😬
So when I ran
prefect deploy
it did not create a <deployment_name>.yaml file, only a prefect.yaml file. So where do I add these
pull_steps
?
j
Inside of
prefect.yaml
you should see a
pull
section that you can modify
hm that example is missing some quotes. but you'll want yours to look something like this. Then when call
prefect deploy
your pull steps will be applied to your deployment.
a
Ok, that makes sense. One more (hopefully last) question - when I adjust the prefect.yaml file, do I need to rerun
prefect deploy
to create the deployment again? If so, I’m assuming I should delete the existing deployment then run through the
prefect deploy
process again?
j
You do need to re-run
prefect deploy
You do not need to delete the deployment, it's keyed by name and will update accordingly if you give the same name for your deployment
a
I tried this and it seems like the deployment has been altered, but I am still getting the “ModuleNotFoundError: No module named ‘prefect_airbyte’” when I run the deployment
@Jake Kaplan - any further ideas?
j
Do you have any output from your pull steps? Can you set
stream_output=True
?
a
I think part of the issue is that I’m having to use Anaconda due to FB Prophet being in one of the Flows. Pretty sure the requirements.txt file isn’t formatted properly as a result, and I’m struggling to figure out how to correct it
j
On your last point @Andy Warren - I think you should be able to install prophet from pypi and not have to use conda. https://pypi.org/project/prophet/
a
Yes, sorry, should have replied back. I did get this done on the server. However, I’m on an M1 for my local and it does not play nicely with it through pip, but does through conda for whatever reason