Dekel R

    Dekel R

    9 months ago
    Hey everyone, I’m trying to deploy one of my flows over Vertex AI (GCP) and Im getting the following error:
    File "/usr/local/lib/python3.9/site-packages/gcsfs/credentials.py", line 84, in _connect_google_default
        raise ValueError(msg.format(self.project, project))
    ValueError: User-provided project 'my_project' does not match the google default project 'some_generated_id'. Either
    Some code snippets and information about my flow -
    from prefect import Flow
    from prefect.storage import Docker
    from tasks.extract_product_data import extract_data
    from prefect.run_configs import VertexRun
    from prefect.schedules import IntervalSchedule
    from datetime import timedelta
    
    schedule = IntervalSchedule(interval=timedelta(days=1))
    
    with Flow("extract_comparable_products_data",
              storage=Docker(registry_url="us-central1-docker.pkg.dev/xxxx/",
                             dockerfile="./Dockerfile"), schedule=schedule) as flow:
        extract_data()
    
    flow.run_config = VertexRun(machine_type='e2-standard-16', labels=["ml"],
                                service_account='<http://prefect_service_account.iam.gserviceaccount.com|prefect_service_account.iam.gserviceaccount.com>')
    The flow has only one task for now for testing purposes. My task is using data from multiple projects of my organization (google cloud projects) so In every google_client interaction I use a specific project as a parameter, for example -
    storage_client = storage.Client(project='my_pro_1')
    The service account that Prefect use has permissions to all of the relevant projects (In general, storage, Bigquery, Artifactory, Vertex AI) . Anyone familiar with this issue? Thanks.
    Anna Geller

    Anna Geller

    9 months ago
    @Dekel R I thought that the way it works is that you spin up an agent in project XYZ and then all your flows that should be deployed to this agent will be also deployed to the same project, not a different one. And then to use other GCP services inside of this flow you would need a service account that has permissions for all those services. So you could have potentially one Vertex agent per project, but let me dig deeper to check this
    Dekel R

    Dekel R

    9 months ago
    So just let me clarify it - I have 2 Prefect agents in one of my projects (one regular docker agent and one Vertex agent) - and my flow is running there but accessing other projects as well. I use a dedicated service account that has permissions in all of the relevant projects (read write storage and Bigquery in the “other” projects, and also Artifactory and Vertex AI permissions in the project where Prefect is running). Let me know if any additional details are needed.
    Anna Geller

    Anna Geller

    9 months ago
    @Dekel R I would try to run this GCS part that results in this error
    File "/usr/local/lib/python3.9/site-packages/gcsfs/credentials.py", line 84, in _connect_google_default
        raise ValueError(msg.format(self.project, project))
    without Prefect first and see whether it works locally, and the same with Vertex - perhaps you can start a Vertex AI notebook and try to run the same code there and see if there is no GCS error? This way you could see if the service account works correctly on Vertex AI without Prefect for now. Based on the error you sent it looks like GCS permissions don’t work with the provided project. Perhaps you could create a new fresh service account with only those permissions and test it out this way?
    Dekel R

    Dekel R

    9 months ago
    Hey, As you suggested I spawned a machine on Vertex (Jupyter notebook) and took the piece of code that invoked the error - and it works just fine, I get no error and the service account got sufficient privileges to access the current project and other projects in my organization (when configuring the machine I used the same service account as Prefect uses).
    Anna Geller

    Anna Geller

    9 months ago
    ok, great! So you tested that this service account works on Vertex AI notebook. A good next step would be to test it in a local flow run e.g. with flow.run() or
    prefect run -p your_flow.py
    Then if that works, I would assign the same service account to the VertexRun and then register and run the flow with the backend.
    prefect register --project xxx -p your_flow.py
    prefect run --project xxx --name yyy --watch
    You could cross-check the Vertex agent permissions. Based on the docstring:
    service_account = Specifies the service account to use
        as the run-as account in vertex. The agent submitting jobs must have
        act-as permission on this run-as account.
    E.g.:
    from prefect.run_configs import VertexRun
    
    run_config = VertexRun(service_account="your_account")
    Dekel R

    Dekel R

    9 months ago
    So in order to make the trouble shoot easier I wrote this dummy flow -
    from prefect import Flow, task
    from prefect.storage import Docker
    from tasks.extract_product_data import extract_data
    from prefect.run_configs import VertexRun
    from prefect.schedules import IntervalSchedule
    from datetime import timedelta
    from src.utils import send_slack_alert
    
    schedule = IntervalSchedule(interval=timedelta(days=1))
    
    @task()
    def extract():
        send_slack_alert('testing stuff - extraction flow')
    
    
    with Flow("extract_comparable_products_data",
              storage=Docker(registry_url="us-central1-docker.pkg.dev/xxx/",
                             dockerfile="./Dockerfile"), schedule=schedule) as flow:
        # extract_data()
        extract()
    
    flow.run_config = VertexRun(machine_type='e2-standard-16', labels=["ml"],
                                service_account='<mailto:prefect-integration@xxx.iam.gserviceaccount.com|prefect-integration@xxx.iam.gserviceaccount.com>')
    See the code for send_slack_alert - (works just fine in non Vertex flows)
    def send_slack_alert(messege):
        secret_slack = Secret("SLACK_WEBHOOK_URL").get()
        <http://requests.post|requests.post>(secret_slack, json={"text": messege})
    Still doesn’t work, I’m still getting the same error of :
    ValueError: User-provided project 'my_project' does not match the google default project 'xxx-tp'. Either
    At first I thought the error is connected somehow to the fact I’m getting data from multiple projects - but this flow suggests its not the case…
    Anna Geller

    Anna Geller

    9 months ago
    What if you skip the service account on VertexRun? It then should take the default one set on the agent which would likely match the google default project?
    Dekel R

    Dekel R

    9 months ago
    It actually worked now - got a slack message. But then I tried running my original flow (that uses different projects and google tools such as Storage and Bigquery) and got (an expected) permissions error. So not specifying a service account did help the dummy flow but its necessary for my more complicated one. I’m now trying to understand the difference between the Vertex default account and my custom Prefect service account.
    Anna Geller

    Anna Geller

    9 months ago
    nice work! So this confirms that the Vertex agent service account needs to have permission to those projects that you use in your flow. This means that it must be configured on the agent’s service account rather than on VertexRun.
    Dekel R

    Dekel R

    9 months ago
    It should work - but I would like to run different flows with different permissions… If I’ll use only the default service account of Vertex it will not be possible.
    I think that my own service account need an an additional role in order to be identical to the default service account of Vertex Ai - which is -
    roles/aiplatform.customCodeServiceAgent
    As you can see in this link - https://cloud.google.com/iam/docs/understanding-roles#aiplatform.customCodeServiceAgent
    Anna Geller

    Anna Geller

    9 months ago
    @Dekel R this section explains how to grant access to Vertex AI to resources in a different project: https://cloud.google.com/vertex-ai/docs/general/access-control#foreign-project Can you try this approach for all projects you use in your flow?
    Dekel R

    Dekel R

    9 months ago
    Hey, So After a deep dive this was the issue - I tried using Vertex run in order to train one of our models but kept getting 2 kind of errors -1. Authentication errors 2. Something about “metadata server” as I posted in the original thread - google.auth.exceptions.RefreshError: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable). Now it seems like its all connected to the metadata server error - you can read about it here -https://cloud.google.com/compute/docs/metadata/overview I read more and found 2 threads about this issue -https://github.com/googleapis/google-auth-library-python/issues/211 https://github.com/googleapis/google-auth-library-python/issues/814 They give 2 options -1. Update the google auth library (Did it, didn’t work) 2. Pass credentials explicitly. I used Prefect secrets in order to pass these - and it works like charm. Not a Prefect or a Vertex ai issue after all… just a google compute issue. Thanks for helping with the investigation!! Google Cloud About VM metadata   https://cloud.google.com/compute/docs/metadata/overview GitHub Intermittent DefaultCredentialsError on GCE · Issue #211 · googleapis/google-auth-library-python Original issue: googleapis/google-cloud-python#4358 After successful use of credentials, _ = google.auth.default(), an application crashes when credentials cannot be detected: ... File "/usr/l... https://github.com/googleapis/google-auth-library-python/issues/211 GitHub DefaultCredentialsError after Compute Engine Metadata server failures · Issue #814 · googleapis/google-auth-library-python Still seeing the same issue mentioned in #211 Environment details OS: Linux/Container-Optimized OS Python version: 3.8.5 pip version: 20.1.1 google-auth version: 1.33.0 Steps to reproduce cred, _ =...