Hi!! Anybody has issues using credentials in DbtSh...
# prefect-community
Hi!! Anybody has issues using credentials in DbtShellTask in Prefect Cloud? I'm working with BigQuery. Do I need to pass the "keyfile" param inside dbt_kwargs? Or if I have a GCP_CREDENTIALS Secret this is not need?
Here is one way of approaching it using Prefect Secrets and `dbt_kwargs`:
Copy code
postgres_user = PrefectSecret("POSTGRES_USER")
    postgres_pass = PrefectSecret("POSTGRES_PASS")
    db_credentials = get_dbt_credentials(postgres_user, postgres_pass)
    dbt_run = dbt(
        command="dbt run", task_args={"name": "DBT Run"}, dbt_kwargs=db_credentials
full example: https://www.prefect.io/blog/flow-of-flows-orchestrating-elt-with-prefect-and-dbt/
can be tackled the same way
Thanks @Anna Geller. Something like this:
Copy code
def get_dbt_credentials(gcp_json):
    return {"keyfile": gcp_json}

def format_command(target_env):
    return f"dbt build --target {target_env}"

with Flow("DBT_flow") as flow:
    target_env = Parameter("target_env", required=True)
    credentials = PrefectSecret("GCP_CREDENTIALS")
    db_credentials = get_dbt_credentials(credentials)
    dbt_run = dbt(
        task_args={"name": "DBT Run"},
I will try it.
this looks good - the only issue you need to be careful about is whether dbt expects the contents of the GCP service account or just a path to it - the Prefect Secret GCP_CREDENTIALS expects the contents of a SA
Yes, that is exactly the issue, because I know that DBT require a path
Should I create a Docker image with this file inside and use an ENV variable in the Dockerfile?
we are working with a custom image to run the flows with Vertex
I can't dive deeper into this atm but this repo may help you - I did a flow of flows example with dbt + GCP here - using a local agent but this should work the same way with Docker agent https://github.com/anna-geller/prefect-monte-carlo/tree/workshop/flows I didn't have to specify any credentials other than setting the SA path on the dbt profile
Sure, Thanks! simple smile
please keep us posted here about your progress - I think using the same approach setting the path on your dbt profile and coping or mounting the SA file to your container should work
In case I get the secret from Prefect Cloud, should I be able to paste this content in a new file, right? I think that will be easier to do. And then use that path file as the keyfile param.
dbt seems to support multiple auth methods so you'd need to experiment which option works best for you, hard to say, sorry I'm not more useful here
Yes, the main trade off with mounting the file in the Docker image is that probably is going to work when I build the image locally (where I have the SA file), but we are using cloudbuild to build and push the image to Google Registry, so in that case we will probably need to use AWS Secrets Manager, and I honestly I have not to much idea about it
no problem Anna! Thanks for your help!
👍 1
@Mateo Merlo where are you running your flows and agent?
You can it seems specify the individual parts of the keyfile directly in the profile.yml -> https://prefect-community.slack.com/archives/CL09KU1K7/p1650625646905089?thread_ts=1650371481.279719&cid=CL09KU1K7 So maybe you could template in the secret bits from prefect secrets? I am running DBT on k8s - I created a k8s secret file and then mounted that into the pods environment via the agent default template - if you are running on k8s would be happy to discuss how I did that
Thanks for your response @Joshua Greenhalgh! Yes, I saw your thread as well.
We are using Vertex and docker.
Copy code
flow.storage = GCS(bucket="influsense_test_bucket", secrets=["GCP_CREDENTIALS"])
flow.executor = LocalDaskExecutor(scheduler="threads")
flow.run_config = VertexRun(
Google Cloud Storage
Did you solve it using the individual parts? or just getting the entire file from k8s secret?
I just mounted the entire file
Would you mind to share how you did it? I've already been working with Kubernetes and running the agent but I've never used secrets
Of course
Is the terraform ok?
Copy code
data "sops_file" "dbt_bq_user_service_account_key" {
  source_file = "./secret_data/dbt-bq-user-sa-key.json"

resource "kubernetes_secret" "dbt_bq_user_service_account_key" {
  metadata {
    name      = "dbt-bq-user-service-account-key"
    namespace = "prefect"

  data = { "dbt-bq-user-sa-key.json" = jsonencode(data.sops_file.dbt_bq_user_service_account_key.data) }
bit is just so I can keep the file in a git repo encrypted - basicly
is the full key for the service account... The I have the following agent template;
Copy code
apiVersion: batch/v1
kind: Job
        - name: flow
          imagePullPolicy: Always
            - mountPath: /service_account_key/
              name: dbt-bq-user-service-account-key
        - name: dbt-bq-user-service-account-key
            secretName: dbt-bq-user-service-account-key
And my profile which is baked into my image looks like;
Copy code
      dataset: mbk_dwh
      fixed_retries: 1
      location: EU
      method: service-account
      priority: interactive
      project: data-analytics-mbk
      threads: 1
      timeout_seconds: 300
      type: bigquery
      keyfile: /service_account_key/dbt-bq-user-sa-key.json
Thanks so much!
I'm currently using AWS Codebuild to build and push the docker image, so I will probably need to do something similar with AWS Secrets Manager to build the image from the cd pipeline
you run all the terraform commands locally, right?
How do you point to this file in the KubernetesRunConfig?
I don't need to specify anything in the run config since it just exists on the pod at
- all I have to do is bake the profile into the image with a ref to that location
I try to run the terraform on CI most of the time...doesn't always work that way however
I mean, how do you say to Prefect that the flow show use that image?
Ahh ok;
Copy code
run_config=KubernetesRun(image=JOB_IMAGE_NAME, labels=[DEFAULT_AGENT_LABEL]),
I am always just using
Thanks so much @Joshua Greenhalgh! I will try first work with the individual parts of the keyfile and if I have not success then I would move to Kubernetes
I think that use a task that read the GCP_CREDENTIALS variable and create a new file with that content can be a good approach as well
And then use that as the keyfile inside profiles.yml
So I started out using vertex but its pretty expensive - if you are only running one gke autopilot cluster the flat fee is discounted so you only pay for the compute you use
k8s is complicated of course but I don't think its insurmountable
👀 1
Totally agree. With one cluster you mean the agent that will be running there, right?