Thread
#prefect-community
    John Lee

    John Lee

    1 year ago
    Hi all, I wonder can someone help me make sense of some errors I am encountering with gcp auth. I am trying to store gcp credentials on the agent as described here so that my tasks are able to use google storage/big query. I am setting PREFECT__CONTEXT__SECRETS__GCP_CREDENTIALS on the agent via a helm chart to a string containing the json credentials. This seems to propagate GOOGLE_APPLICATION_CREDENTIALS to each prefect job, and the creds are different to what I set on the agent, but this var is set to the json contents rather than a path in the container containing the credentials. This causes errors for the prefect google utilities and the google api in python. I can hack a fix for this by running something like the following but I am wondering if this is expected behaviour or I am setting up the agent incorrectly?
    from pathlib import Path
    import tempfile
    import os
    import google.auth
    
    creds = Path(tempfile.NamedTemporaryFile().name)
    creds.write_text(os.environ["GOOGLE_APPLICATION_CREDENTIALS"])
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = str(creds)
    google.auth.default()
    Kevin Kho

    Kevin Kho

    1 year ago
    Hey @John Lee, will try this myself and check with the team on this
    Are you on server or Cloud?
    So when Prefect gets the Client, it passes the credentials from
    GCP_CREDENTIALS
    . It would only default to
    GOOGLE_APPLICATION_CREDENTIALS
    is this is missing (lines 32-33 here). So if you provide the secret, I don’t think it should hit that, unless you use another client inside your script or maybe use your own GCP task?
    If you have to provide your
    GCP_CREDENTIALS
    and the
    GOOGLE_APPLICATION_CREDENTIALS
    as a file, it does seem weird and I would honestly just go the
    GOOGLE_APPLICATION_CREDENTIALS
    cuz all the Prefect code in the task library just uses the Google
    Client
    anyway which will fall back to that. No sense to use both for sure.
    Vinicius Cerutti

    Vinicius Cerutti

    1 year ago
    Hi @Kevin Kho, similar situation as above, but I wanted to check something. If the
    PREFECT__CONTEXT__SECRETS__GCP_CREDENTIALS
      variable is set in the helm chart for the kubernetes agent, does it need to be directly set in the agent start arg as well? (e.g
    prefect agent kubernets start -e PREFE...GCP_CREDENTIALS=$...
    ) or it will be directly recognized as a secret?
    Kevin Kho

    Kevin Kho

    1 year ago
    Do you mean specifying with
    --env PREFECT__CONTEXT__SECRETS__GCP_CREDENTIALS=….
    ?
    Vinicius Cerutti

    Vinicius Cerutti

    1 year ago
    yes
    I thought that was the case (grab directly fron env), but I could't spot the values when executing
    credentials = prefect.context.get("secrets", {}).get("GCP_CREDENTIALS")
    inside the flow pod
    Kevin Kho

    Kevin Kho

    1 year ago
    I think this should work without setting it on the agent. I say this because environment variables with
    PREFECT___CONTEXT___…
    are loaded into the context and the Secret will be held in the context. In general though, not all env variables are copied cover. Also, environment variables are not copied over to Dask workers, but the prefect context is. So if it is already in
    prefect.context.secrets
    , I think it will make it to the Dask worker.
    Vinicius Cerutti

    Vinicius Cerutti

    1 year ago
    I see, so there is a chance that the secret might not be passed to the context?
    Kevin Kho

    Kevin Kho

    1 year ago
    Do you use Cloud secrets as well? Maybe you can try setting
    "PREFECT__CLOUD__USE_LOCAL_SECRETS" = "true"
    and see if that helps? Secret.get() will look locally. Also, I think the syntax would be
    Secret("GCP_CREDENTIALS").get()
    if you want it to pull the environment variable?
    Vinicius Cerutti

    Vinicius Cerutti

    1 year ago
    "PREFECT__CLOUD__USE_LOCAL_SECRETS" = "true"
     and see if that helps?
    Amazing, this seems very reasonable too. I will try that out. Uhm, I will try using Secrets as well. thanks
    Kevin Kho

    Kevin Kho

    1 year ago
    Actually, you might indeed need the env flag because your agent is not the same pod that the Flow runs in.
    I just read the docs again
    Vinicius Cerutti

    Vinicius Cerutti

    1 year ago
    oh nice!! thanks, I read through it but didn't spot that detail
    thanks! I will try that out and see how it goes
    John Lee

    John Lee

    1 year ago
    Thanks for the help with this @Kevin Kho. We are using the cloud UI. The GOOGLE_APPLICATION_CREDENTIALS was a red herring (I had originally tried this and then remove that but it leaked back in during a rebase). The error I was seeing was occurring because GCP_CREDENTIALS was not being set as we expected. The env variable was being set on the k8s agent and not on the flow pod. We want to set credentials as part of the deployment (via the agent instead of the web ui) so will pursue the
    --env
    option.
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    Hi @Kevin Kho, it's me again, thanks for the overall discussion last time, I was able to solve that problem. Unfortunately I'm in a situation where I need some google authentication in my flow pipeline, I got the impression when reading the docs that once I've submitted the GCP_CREDENTIALS as a secret to Prefect I would have all necessary Gcloud auth automatically set, but that didn't seemed to be that case when I tested. Do you know, or have an idea, of what should be the best approach to do such authentication for every flow? I had the idea of reaching out to the prefect Secrets, as a mounted service, and store the necessary credentials in it.. is that possible?
    Kevin Kho

    Kevin Kho

    11 months ago
    Are you using some Client like the Bigquery Client and it’s not working? What error do you get? Most of the tasks do use GCP_CREDENTIALS
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    yup, I've tried that approach as well. And it didn't work
    Most of the tasks do use GCP_CREDENTIALS
    That's one of my doubts, do we need to use the GCP task in order to access the GCP secret for the run? or just running a simple task will be enough?
    as a toy example that explains my pipeline process: • we set the gcp secret under
    -e
    arg for the agent start command • we then succeed in authenticating the flow, using prefect login (it's other credentials file) and send the flow pipeline to prefct.io • but, the real problem is, for instance, executing a pd.read_csv or using the client in the flow ends up in a error. We were getting anonymous caller insufficient permission to access Google cloud storage
    I was able to solve the above error using the GCSFileSystem directly in the pipeline, but it only worked for a specific part of my pipeline. So if I could get a better understanding about how prefect handles the gcp secret and authenticates it, or maybe another way around of doing it.. it would be immensely helpful
    Kevin Kho

    Kevin Kho

    11 months ago
    You need to use the GCP Task because the underlying code does some if else to use the GCP_CREDENTIALS. If you use the GCP Client, it GCFS, I think it looks for GOOGLE_APPLICATION_CREDENTIALS. GCP_CREDENTIALS is a Prefect thing. GOOGLE_APPLICATION_CREDENTIALS is the default GCP thing.
    I think all the Prefect GCP Tasks use this utility function under the hood to authenticate
    If you pass the env variable through the
    -e
    flag. It gets added to the context and the
    prefect.context.get("secrets", {}).get("GCP_CREDENTIALS")
    will be able to grab it and pass it to the
    Client
    . Otherwise, it falls back to this line, and the default GCP Client falls back to
    GOOGLE_APPLICATION_CREDENTIALS
    . This is my current understanding. John here might know way more
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    I see, my current understanfig is the same as you explainad, so it's good to know that it all makes sense as weel 😄 .
    Thanks for providing the links for the client functions there, it gave me an idea. Besides that, do you think it's possible to add in the get_google_client an optional set_google_application_credentials phase as well? I think it would help syncing it with different tools that support it under the hood, e.g pandas
    Kevin Kho

    Kevin Kho

    11 months ago
    get_google_client
    can take in
    credentials
    and use those instead of the secret. So you might be able to import that function and use it to create a Client by passing credentials directly?
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    yup, that's the current workaround I did to get some of the process working
    Kevin Kho

    Kevin Kho

    11 months ago
    Sorry, I’m not following. What more would a
    set_google_application_credentials
    phase give you?
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    also, not related to credentials... but is there a reason why there is not enable option for the different resources in the prefect-server helm charts?
    Kevin Kho

    Kevin Kho

    11 months ago
    That…you would need to post in community because my kubernetes is not good enough to answer. I’d need to find another team member. 😅
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    hshs, no problem I was about to create an issue for that as well. Thank you so much for the help 😄
    Kevin Kho

    Kevin Kho

    11 months ago
    oh yeah issue might be better. things are a bit hectic this week, so there may be a delay in a response but if there is none in a couple of days, you can ping me and i can follow up
    Vinicius Cerutti

    Vinicius Cerutti

    11 months ago
    wonderful, thanks 👍