David
05/03/2023, 4:38 PMfrom prefect.deployments import Deployment
from prefect.filesystems import GitHub
from etl_web_to_gcs import etl
github_block = GitHub.load("github-block")
# Deployment Object
github_dep = Deployment.build_from_flow(
flow=etl,
name="github-flow",
infrastructure=github_block
)
if __name__ == "__main__":
github_dep.apply()
When I run the code above, I get the following error:
...
cls_init(__pydantic_self__, **data)
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Deployment
infrastructure
Infrastructure block must have 'run-infrastructure' capabilities. (type=value_error)
Any idea how to fix this?redsquare
05/03/2023, 4:38 PMredsquare
05/03/2023, 4:38 PMDavid
05/03/2023, 4:39 PMredsquare
05/03/2023, 4:40 PMredsquare
05/03/2023, 4:40 PMAli Mir
07/07/2023, 4:20 PMProcess
. Yesterday we did the Prefect associate certificate training and a part of that was hosting code on github (basically telling the flow where the code that needs to be executed lives).
Our team just recently migrated from Prefect 1 to 2 and we're still trying to figure out how to work with Prefect 2 efficently; a few flow that we have migrated from 1 to 2 are hosted on an s3 bucket which is a headache for us, all of our code already lives on github.
Could you advice me on how I can configure Prefect 2, in both the cloud (using ECS) and for local testing/dev (using Process
) to fetch the code from github? The github repos are ofcourse private.
Is it as simple as creating a github block and passing credentials?
Side question, during training the usage of prefect.yaml came up. How does that come into play in this context? Is it necessary? Do you write your own yaml file before deployment of a flow if required to connect to github?Ali Mir
07/07/2023, 4:20 PMEmil Christensen
07/07/2023, 5:00 PMGitHub
block) and infrastructure blocks are intended for use with agents and are generally incompatible with workers and the prefect.yaml
way of deploying flows. When using workers and prefect.yaml
, the prefect.deployments.steps.git_clone
pull step serves the same purpose as the GitHub
storage block.
Regardless of whether you have an agent or a worker, what happens is that at run-time the flow will pull code from git. This happens regardless of the infrastructure. The caveat here is that I’m assuming you are running a deployment that gets executed by an agent or worker. If you’re just running your flow directly (e.g. python my_flow.py
), then no code gets downloaded.Emil Christensen
07/07/2023, 5:02 PMprefect.yaml
file and the prefect deploy
CLI command, then add a prefect.deployments.steps.git_clone
step to your pull section (example). Now every time a worker executes a flow run, it’ll pull the code from GitHub
2. Alternatively, if you want to define your deployments in Python, create a GitHub
storage block and pass it to your deployment. Now, when an agent executes a flow run, it’ll pull code from GitHub.Ali Mir
07/07/2023, 5:02 PMEmil Christensen
07/07/2023, 5:03 PMAli Mir
07/07/2023, 5:03 PMAli Mir
07/07/2023, 5:03 PMEmil Christensen
07/07/2023, 5:03 PMGitHub
block or your prefect.deployments.steps.git_clone
step.Emil Christensen
07/07/2023, 5:04 PMaccess_token
)Ali Mir
07/07/2023, 5:05 PMAli Mir
07/07/2023, 5:58 PMAli Mir
07/07/2023, 6:02 PMEmil Christensen
07/07/2023, 10:09 PMhow this file gets createdYou can initialize it with
prefect init
, optionally from a recipe. Also, if you do prefect deploy
and deploy a flow, you can write out a new file at the end of that.
Also do you have a single Prefect.yaml for the whole repo or is it flow/script specific?Generally one per repo. Within it you can have multiple different deployments (usually one per flow).
Ali Mir
07/07/2023, 10:43 PMAli Mir
07/11/2023, 6:58 PMprefect deploy
I provided the repo url and the branch name, then I generated an access token and provided that as well. I selected process
as our esc is not fully fleshed out yet.
Now, the flow uses 2 external packages, pandas and s3fs. The are imported in this order:
import pandas as pd
import s3fs
After deploying, starting the workpool using this command : prefect worker start --pool 'test-pool'
I ran the flow using the UI. Now I faced an issue with the packages. prefect.exceptions.ScriptError: Script at 'flows/read_csv.py' encountered an exception: ModuleNotFoundError("No module named 's3fs'")
Now I am confused because of 2 questions:
1. Why the flow executed locally and not on cloud
2. How to make sure that the Prefect process has access to the packages that the script requires?
3. And oddly enough, why did it fail on `s3fs`and not on pandas
even tho the latter was imported first?
4. How can I remedy this?
Thank you so much for helping me out with thisAli Mir
07/11/2023, 9:06 PMMitch
07/13/2023, 2:41 PMDeployment.build_from_flow
deployment so we are not using prefect.yaml file to deploy. I have created a github block wit the credentials. Is there a port mapping or security groups which may be required when the task is spinning up in ECS? Is there a way to validate, print, or pause the container so I can view if the files are cloned in the task container? Any help would be greatly appreciated as I have spent a lot of time on this seemingly trivial implementation 🙂Mitch
07/13/2023, 2:42 PMEmil Christensen
07/17/2023, 4:10 PMwhich prefect
? Can you run the flow successfully with something like python flow.py
? If so, what’s the output of which python
?Emil Christensen
07/17/2023, 4:16 PMDeployment.build_from_flow
, are you passing the GitHub block you created? You should see a log in the agent that says something to the effect of “pulling files from repo …“.
You could peek at the files by adding something like the following to your flow:
import os
print(f"Current dir is {os.path.abspath('.')}")
print(f"Files: {os.listdir('.')}")