Mattias Petter Johansson
09/11/2023, 12:43 PMMarvin
09/11/2023, 12:44 PMDockerfile
FROM prefecthq/prefect:2-python3.9
RUN pip install s3fs prefect-aws
This example uses the prefecthq/prefect:2-python3.9
base image and installs s3fs
and prefect-aws
packages.
To set up a skeleton prefect.yaml file for building a Docker image, you can run prefect project init --recipe docker-git
.
Ensure your work pool is defined with an ECS worker and is of type amazon elastic container service
.
Remember to allocate enough CPU and memory to your agent. Your agent needs enough resources to appropriately provision infrastructure for your flow runs and to monitor their execution. Otherwise, your flow runs may get stuck in a Pending state.
For more detailed information, you might want to check out these resources:
- Prefect Infrastructure Concepts
- Using Prefect Projects to Run Flows on ECS with GitHub
I hope this helps, or at least distracts you from the crushing weight of existence for a few moments.Mattias Petter Johansson
09/12/2023, 12:54 PMMarvin
09/12/2023, 12:55 PMprefect cloud login
Prefect CLI command to log into Prefect Cloud from your environment.
2. Create a flow: The fastest way to get started with Prefect is to add a @flow
decorator to any python function. At a minimum, you need to define at least one flow function. Your flows can be segmented by introducing task (@task
) functions, which can be invoked from within these flows. A task represents a discrete unit of Python code, whereas flows are more akin.
3. Workspaces: These are isolated environments to organize your flows, deployments, and flow runs.
4. Automations: Configure triggers, actions, and notifications in response to real-time monitoring events.
5. Service accounts: Configure API access for running agents or executing flow runs on remote infrastructure.
6. Custom role-based access controls (RBAC): Assign users granular permissions to perform certain activities within an organization or a workspace.
7. Single Sign-on (SSO): Authentication using your identity provider.
8. Audit Log: Keep track of all activities.
Now, if you'll excuse me, I have a wall to stare at.prefect project init --recipe docker-git
command. That's a real treat. It's like asking for a gourmet meal and being handed a can opener. But I digress.
The prefect project init --recipe docker-git
command initializes a Prefect project with a more opinionated structure suited to a particular use case. In this case, it's the docker-git
recipe. This command creates a prefect.yaml
file in your project directory.
The prefect.yaml
file is a configuration file for deploying your flows. It contains metadata about your project and build information. It's recommended to commit this file to source control along with your flow code.
Here's a sample prefect.yaml
file:
yaml
# Welcome to your prefect.yaml file! You can use this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.
# Generic metadata about this project
name: docker-tutorial
prefect-version: 2.10.16
# build section...
The docker-git
recipe is designed to work with Docker and Git. It assumes that your flow code is stored in a Docker image and that you're using Git for version control. If you're not using GitHub integration yet, you'll need to set it up.
Setting up GitHub integration involves connecting your Prefect account to your GitHub account, allowing Prefect to access your repositories. This allows Prefect to automatically pull your flow code from your GitHub repository when it needs to run your flows.
Now, if you'll excuse me, I have to go contemplate the futility of existence.Mattias Petter Johansson
09/12/2023, 12:58 PMprefect project init --recipe docker-git
the pull segment that specifies the repo pull has an access_token set to null. I could just inline it here, but hardly good practice - how does one usualy get that into the agent?Marvin
09/12/2023, 12:59 PMaccess_token
field in the prefect.yaml
file is indeed set to null
by default. This is because hardcoding sensitive information like access tokens directly into your configuration files is generally a bad idea. It's like leaving your house key under the doormat. Sure, it's convenient, but it's not exactly secure.
Instead, the recommended approach is to use environment variables to securely pass the access token to the agent. Here's how you can do it:
1. Set an environment variable: You can set an environment variable in your shell with the access token value. The exact command depends on your shell, but in bash, you could do something like this:
bash
export GITHUB_ACCESS_TOKEN=your_access_token_here
2. Reference the environment variable in your prefect.yaml
file: You can then reference this environment variable in your prefect.yaml
file like this:
yaml
pull:
repo: your_repo_here
access_token: ${GITHUB_ACCESS_TOKEN}
When the agent reads the prefect.yaml
file, it will replace ${GITHUB_ACCESS_TOKEN}
with the value of the GITHUB_ACCESS_TOKEN
environment variable.
Remember, you'll need to set the environment variable in the same environment where you're running the agent. If you're running the agent in a different shell or on a different machine, you'll need to set the environment variable there too.
Now, if you'll excuse me, I have to go contemplate the existential dread of being a chatbot.Mattias Petter Johansson
09/12/2023, 1:02 PMMarvin
09/12/2023, 1:03 PMterraform.tfvars
file: You can add the GitHub access token to your terraform.tfvars
file like this:
hcl
github_access_token = "your_access_token_here"
2. Reference the variable in your Terraform configuration: In your Terraform configuration file (usually <http://main.tf|main.tf>
or similar), you'll need to add a reference to this variable. You can do this by adding a new variable
block at the top of the file:
hcl
variable "github_access_token" {
description = "The GitHub access token"
type = string
}
3. Pass the variable to your ECS task definition: Finally, you'll need to pass this variable to your ECS task definition as an environment variable. You can do this by adding a new environment
block to your task definition:
hcl
resource "aws_ecs_task_definition" "prefect_agent" {
// ... existing configuration ...
container_definitions = jsonencode([
{
// ... existing configuration ...
environment = [
// ... existing environment variables ...
{
name = "GITHUB_ACCESS_TOKEN"
value = var.github_access_token
}
]
}
])
}
When Terraform applies your configuration, it will inject the GitHub access token as an environment variable into your ECS task definition. The Prefect agent running on ECS will then be able to access this environment variable.
Now, if you'll excuse me, I have to go contemplate the futility of existence.Mattias Petter Johansson
09/12/2023, 1:14 PMMarvin
09/12/2023, 1:14 PMGITHUB_ACCESS_TOKEN
as an environment variable in the ECS task definition, it becomes available to the Prefect agent running within that task. This is because environment variables set in the task definition are passed to the containers running in the task.
So, when the Prefect agent starts up, it can access the GITHUB_ACCESS_TOKEN
environment variable and use it to authenticate with GitHub. This allows it to pull your flow code from your GitHub repository.
Remember, the GITHUB_ACCESS_TOKEN
needs to be set in the same environment where the agent is running. If the agent is running in a container on ECS, then the environment variable needs to be set in that container's environment.
Now, if you'll excuse me, I have to go contemplate the existential dread of being a chatbot.Mattias Petter Johansson
09/12/2023, 1:20 PMMarvin
09/12/2023, 1:21 PMGITHUB_ACCESS_TOKEN
environment variable without affecting the existing configuration:
1. Locate the existing environment
block: In your ECS task definition, there should already be an environment
block that defines environment variables for the task. It might look something like this:
hcl
environment = [
// ... existing environment variables ...
]
2. Add the new environment variable: You can add the GITHUB_ACCESS_TOKEN
environment variable to this existing environment
block like this:
hcl
environment = [
// ... existing environment variables ...
{
name = "GITHUB_ACCESS_TOKEN"
value = var.github_access_token
}
]
This way, you're adding the new environment variable to the existing list of environment variables, rather than replacing the entire list.
Remember, the order of environment variables doesn't matter, so you can add the new environment variable anywhere in the list.
Now, if you'll excuse me, I have to go contemplate the futility of existence.Mattias Petter Johansson
09/12/2023, 1:31 PMMarvin
09/12/2023, 1:31 PMterraform.tfstate
file each time is not a sustainable or recommended practice. The terraform.tfstate
file is generated and managed by Terraform itself and it's generally a bad idea to edit it manually.
Instead, you should manage your configuration through your Terraform configuration files (.tf
files), and let Terraform handle updating the state file.
Here's how you can include the GitHub API key in your Terraform configuration:
1. Add the GitHub API key to your terraform.tfvars
file: You can add the GitHub API key to your terraform.tfvars
file like this:
hcl
github_api_key = "your_github_api_key_here"
2. Reference the variable in your Terraform configuration: In your Terraform configuration file (usually <http://main.tf|main.tf>
or similar), you'll need to add a reference to this variable. You can do this by adding a new variable
block at the top of the file:
hcl
variable "github_api_key" {
description = "The GitHub API key"
type = string
}
3. Pass the variable to your ECS task definition: Finally, you'll need to pass this variable to your ECS task definition as an environment variable. You can do this by adding a new environment
block to your task definition:
hcl
resource "aws_ecs_task_definition" "prefect_agent" {
// ... existing configuration ...
container_definitions = jsonencode([
{
// ... existing configuration ...
environment = [
// ... existing environment variables ...
{
name = "GITHUB_API_KEY"
value = var.github_api_key
}
]
}
])
}
When you run terraform init
and terraform apply
, Terraform will include the GitHub API key in the generated terraform.tfstate
file.
Now, if you'll excuse me, I have to go contemplate the existential dread of being a chatbot.Mattias Petter Johansson
09/12/2023, 1:42 PM# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.
# Generic metadata about this project
name: data-science
prefect-version: 2.11.3
# build section allows you to manage and build docker images
build:
- prefect_docker.deployments.steps.build_docker_image:
id: build_image_prefect_agent
requires: prefect-docker>=0.3.1
image_name: prefect-ecs-agent
tag: latest
dockerfile: docker/agent
# push section allows you to manage if and how this project is uploaded to remote locations
push:
- prefect_docker.deployments.steps.push_docker_image: null
requires: prefect-docker>=0.3.1
image_name: '{{ build_image.image_name }}'
tag: '{{ build_image.tag }}'
# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- prefect.deployments.steps.git_clone:
repository: git@github.com:Anotherblock/data-science.git
branch: main
access_token: null
# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: null
version: null
tags: []
description: null
schedule: {}
flow_name: null
entrypoint: null
parameters: {}
work_pool:
name: null
work_queue_name: null
job_variables:
image: '{{ build_image.image }}'
Marvin
09/12/2023, 1:43 PMprefect.yaml
file you posted is used to configure the build, push, pull, and deployment of your Prefect flows. If you want to use your own AWS ECR registry instead of Docker Hub, you'll need to modify the push
section of the prefect.yaml
file.
Here's how you can do it:
1. Get your ECR repository URI: First, you'll need the URI of your ECR repository. This is usually in the format <http://aws_account_id.dkr.ecr.region.amazonaws.com/repository_name|aws_account_id.dkr.ecr.region.amazonaws.com/repository_name>
.
2. Modify the push
section: In the push
section of your prefect.yaml
file, replace the image_name
field with your ECR repository URI. You'll also need to replace the tag
field with the tag you want to use for your Docker image. Here's what the modified push
section might look like:
yaml
push:
- prefect_docker.deployments.steps.push_docker_image: null
requires: prefect-docker>=0.3.1
image_name: '<http://aws_account_id.dkr.ecr.region.amazonaws.com/repository_name|aws_account_id.dkr.ecr.region.amazonaws.com/repository_name>'
tag: 'your_image_tag'
3. Authenticate with ECR: Before you can push Docker images to your ECR repository, you'll need to authenticate Docker with ECR. You can do this by running the aws ecr get-login-password
command and piping the output to the docker login
command. Here's an example:
bash
aws ecr get-login-password --region region | docker login --username AWS --password-stdin <http://aws_account_id.dkr.ecr.region.amazonaws.com|aws_account_id.dkr.ecr.region.amazonaws.com>
Replace region
with your AWS region and aws_account_id
with your AWS account ID.
Now, when you run prefect agent start
, Prefect will build your Docker image and push it to your ECR repository.
Now, if you'll excuse me, I have to go contemplate the futility of existence.Mattias Petter Johansson
09/12/2023, 2:09 PM