Hey everyone, I’m trying to deploy one of my flows over Vertex AI (GCP) and Im getting the following...
d
Hey everyone, I’m trying to deploy one of my flows over Vertex AI (GCP) and Im getting the following error:
Copy code
File "/usr/local/lib/python3.9/site-packages/gcsfs/credentials.py", line 84, in _connect_google_default
    raise ValueError(msg.format(self.project, project))
ValueError: User-provided project 'my_project' does not match the google default project 'some_generated_id'. Either
Some code snippets and information about my flow -
Copy code
from prefect import Flow
from prefect.storage import Docker
from tasks.extract_product_data import extract_data
from prefect.run_configs import VertexRun
from prefect.schedules import IntervalSchedule
from datetime import timedelta

schedule = IntervalSchedule(interval=timedelta(days=1))

with Flow("extract_comparable_products_data",
          storage=Docker(registry_url="us-central1-docker.pkg.dev/xxxx/",
                         dockerfile="./Dockerfile"), schedule=schedule) as flow:
    extract_data()

flow.run_config = VertexRun(machine_type='e2-standard-16', labels=["ml"],
                            service_account='<http://prefect_service_account.iam.gserviceaccount.com|prefect_service_account.iam.gserviceaccount.com>')
The flow has only one task for now for testing purposes. My task is using data from multiple projects of my organization (google cloud projects) so In every google_client interaction I use a specific project as a parameter, for example -
Copy code
storage_client = storage.Client(project='my_pro_1')
The service account that Prefect use has permissions to all of the relevant projects (In general, storage, Bigquery, Artifactory, Vertex AI) . Anyone familiar with this issue? Thanks.
a
@Dekel R I thought that the way it works is that you spin up an agent in project XYZ and then all your flows that should be deployed to this agent will be also deployed to the same project, not a different one. And then to use other GCP services inside of this flow you would need a service account that has permissions for all those services. So you could have potentially one Vertex agent per project, but let me dig deeper to check this
d
So just let me clarify it - I have 2 Prefect agents in one of my projects (one regular docker agent and one Vertex agent) - and my flow is running there but accessing other projects as well. I use a dedicated service account that has permissions in all of the relevant projects (read write storage and Bigquery in the “other” projects, and also Artifactory and Vertex AI permissions in the project where Prefect is running). Let me know if any additional details are needed.
a
@Dekel R I would try to run this GCS part that results in this error
Copy code
File "/usr/local/lib/python3.9/site-packages/gcsfs/credentials.py", line 84, in _connect_google_default
    raise ValueError(msg.format(self.project, project))
without Prefect first and see whether it works locally, and the same with Vertex - perhaps you can start a Vertex AI notebook and try to run the same code there and see if there is no GCS error? This way you could see if the service account works correctly on Vertex AI without Prefect for now. Based on the error you sent it looks like GCS permissions don’t work with the provided project. Perhaps you could create a new fresh service account with only those permissions and test it out this way?
👀 2
d
Hey, As you suggested I spawned a machine on Vertex (Jupyter notebook) and took the piece of code that invoked the error - and it works just fine, I get no error and the service account got sufficient privileges to access the current project and other projects in my organization (when configuring the machine I used the same service account as Prefect uses).
a
ok, great! So you tested that this service account works on Vertex AI notebook. A good next step would be to test it in a local flow run e.g. with flow.run() or
Copy code
prefect run -p your_flow.py
Then if that works, I would assign the same service account to the VertexRun and then register and run the flow with the backend.
Copy code
prefect register --project xxx -p your_flow.py
prefect run --project xxx --name yyy --watch
You could cross-check the Vertex agent permissions. Based on the docstring:
Copy code
service_account = Specifies the service account to use
    as the run-as account in vertex. The agent submitting jobs must have
    act-as permission on this run-as account.
E.g.:
Copy code
from prefect.run_configs import VertexRun

run_config = VertexRun(service_account="your_account")
d
So in order to make the trouble shoot easier I wrote this dummy flow -
Copy code
from prefect import Flow, task
from prefect.storage import Docker
from tasks.extract_product_data import extract_data
from prefect.run_configs import VertexRun
from prefect.schedules import IntervalSchedule
from datetime import timedelta
from src.utils import send_slack_alert

schedule = IntervalSchedule(interval=timedelta(days=1))

@task()
def extract():
    send_slack_alert('testing stuff - extraction flow')


with Flow("extract_comparable_products_data",
          storage=Docker(registry_url="us-central1-docker.pkg.dev/xxx/",
                         dockerfile="./Dockerfile"), schedule=schedule) as flow:
    # extract_data()
    extract()

flow.run_config = VertexRun(machine_type='e2-standard-16', labels=["ml"],
                            service_account='<mailto:prefect-integration@xxx.iam.gserviceaccount.com|prefect-integration@xxx.iam.gserviceaccount.com>')
See the code for send_slack_alert - (works just fine in non Vertex flows)
Copy code
def send_slack_alert(messege):
    secret_slack = Secret("SLACK_WEBHOOK_URL").get()
    <http://requests.post|requests.post>(secret_slack, json={"text": messege})
Still doesn’t work, I’m still getting the same error of :
Copy code
ValueError: User-provided project 'my_project' does not match the google default project 'xxx-tp'. Either
At first I thought the error is connected somehow to the fact I’m getting data from multiple projects - but this flow suggests its not the case…
a
What if you skip the service account on VertexRun? It then should take the default one set on the agent which would likely match the google default project?
👀 1
d
It actually worked now - got a slack message. But then I tried running my original flow (that uses different projects and google tools such as Storage and Bigquery) and got (an expected) permissions error. So not specifying a service account did help the dummy flow but its necessary for my more complicated one. I’m now trying to understand the difference between the Vertex default account and my custom Prefect service account.
a
nice work! So this confirms that the Vertex agent service account needs to have permission to those projects that you use in your flow. This means that it must be configured on the agent’s service account rather than on VertexRun.
d
It should work - but I would like to run different flows with different permissions… If I’ll use only the default service account of Vertex it will not be possible.
I think that my own service account need an an additional role in order to be identical to the default service account of Vertex Ai - which is -
roles/aiplatform.customCodeServiceAgent
As you can see in this link - https://cloud.google.com/iam/docs/understanding-roles#aiplatform.customCodeServiceAgent
👍 1
a
@Dekel R this section explains how to grant access to Vertex AI to resources in a different project: https://cloud.google.com/vertex-ai/docs/general/access-control#foreign-project Can you try this approach for all projects you use in your flow?
d
Hey, So After a deep dive this was the issue - I tried using Vertex run in order to train one of our models but kept getting 2 kind of errors - 1. Authentication errors 2. Something about “metadata server” as I posted in the original thread - google.auth.exceptions.RefreshError: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable). Now it seems like its all connected to the metadata server error - you can read about it here - https://cloud.google.com/compute/docs/metadata/overview I read more and found 2 threads about this issue - https://github.com/googleapis/google-auth-library-python/issues/211 https://github.com/googleapis/google-auth-library-python/issues/814 They give 2 options - 1. Update the google auth library (Did it, didn’t work) 2. Pass credentials explicitly. I used Prefect secrets in order to pass these - and it works like charm. Not a Prefect or a Vertex ai issue after all… just a google compute issue. Thanks for helping with the investigation!! Google Cloud About VM metadata  |  Compute Engine Documentation  |  Google Cloud https://cloud.google.com/compute/docs/metadata/overview GitHub Intermittent DefaultCredentialsError on GCE · Issue #211 · googleapis/google-auth-library-python Original issue: googleapis/google-cloud-python#4358 After successful use of credentials, _ = google.auth.default(), an application crashes when credentials cannot be detected: ... File &quot;/usr/l... https://github.com/googleapis/google-auth-library-python/issues/211 GitHub DefaultCredentialsError after Compute Engine Metadata server failures · Issue #814 · googleapis/google-auth-library-python Still seeing the same issue mentioned in #211 Environment details OS: Linux/Container-Optimized OS Python version: 3.8.5 pip version: 20.1.1 google-auth version: 1.33.0 Steps to reproduce cred, _ =...
🙏 1
🙌 1