https://prefect.io logo
f

Felipe Saldana

03/04/2021, 7:40 PM
Question about using secrets. I have my flow running successfully ... I would like to know if this is the best practice on using secrets: • create them outside the flow • calling the .run() method on them
Copy code
aurora_user_val = EnvVarSecret("AURORA_USERNAME", raise_if_missing=True).run()
aurora_pass_val = EnvVarSecret("AURORA_PASSWORD", raise_if_missing=True).run()
aurora_host_val = EnvVarSecret("AURORA_HOST", raise_if_missing=True).run()

with Flow("test_vars") as flow:
z

Zanie

03/04/2021, 7:48 PM
Hi @Felipe Saldana -- I would not recommend doing it this way. You should pass a secret into a task, it'll get resolved into its value when the flow runs. I'll write a quick example
Copy code
from prefect import Flow, task
from prefect.tasks.secrets.env_var import EnvVarSecret

@task
def show_shell(shell):
    print("Shh, it's secret. This is my shell: {}".format(shell))

with Flow("env-secret-passing") as flow:
    shell = EnvVarSecret("SHELL")

    # Pass the secret in, it'll be resolved to a value at runtime
    show_shell(shell)

    # Task doesn't care if it's a secret or a normal value passed in
    show_shell("my-fake-shell")

flow.run()
f

Felipe Saldana

03/04/2021, 8:01 PM
Thanks for the quick reply. I will be back shortly and will post the issues I get when I dont have the code like that.
@Zanie This is what I am trying to do. I am trying to use EnvVarSecret in the constructor of my custom task object.
Copy code
class MyGenericTask(Task):
    def __init__(self, auroraUser, auroraPass, auroraHost, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.auroraUser = auroraUser
        self.auroraPass = auroraPass
        self.auroraHost = auroraHost

    def _do_work(self, views: list) -> None:
        i = 0

    def run(self, different_per_run):
        <http://logger.info|logger.info>(f'user: {self.auroraUser}')
        <http://logger.info|logger.info>(f'pass: {self.auroraPass}')
        <http://logger.info|logger.info>(f'host: {self.auroraHost}')
        <http://logger.info|logger.info>(f'different_per_run: {different_per_run}')
        self._do_work(different_per_run)


with Flow("test_vars") as flow:
    aurora_user_val = EnvVarSecret("AURORA_USERNAME", raise_if_missing=True)
    aurora_pass_val = EnvVarSecret("AURORA_PASSWORD", raise_if_missing=True)
    aurora_host_val = EnvVarSecret("AURORA_HOST", raise_if_missing=True)

    refresh_views = MyGenericTask(
        auroraUser=aurora_user_val,
        auroraPass=aurora_pass_val,
        auroraHost=aurora_host_val,
        name="refresh_views"
    )

    refresh_views.bind(different_per_run="testing")
My first question is why arent the vars used in the run() method getting evaluated after they have been set in the constructor?
Anyone get a chance to look at this?
z

Zanie

03/05/2021, 4:05 PM
Copy code
import os
from prefect import Flow, task, Task
from prefect.tasks.secrets.env_var import EnvVarSecret

@task
def show_shell(shell):
    print("Shh, it's secret. This is my shell: {}".format(shell))


class ShowShell(Task):
    def __init__(self, secret_name: str = "SHELL", **kwargs):
        self.secret_name = secret_name

        super().__init__(**kwargs)

    def run(self):
        # You can use the EnvVarSecret here
        shell = EnvVarSecret(self.secret_name).run()
        # but really there's no reason to not just pull it from the env 
        # shell = os.environ.get(self.secret_name)
        print("Shh, it's secret. This is my shell: {}".format(shell))


# Initialize some subclass style tasks with configuration
show_shell_task = ShowShell("SHELL")
show_fake_shell_task = ShowShell("FAKE_SHELL")

with Flow("env-secret-passing") as flow:
    shell = EnvVarSecret("SHELL")

    # Pass the secret in, it'll be resolved to a value at runtime
    show_shell(shell)

    # Task doesn't care if it's a secret or a normal value passed in
    show_shell("my-fake-shell")

    # Run the subclass style tasks in our flow
    show_shell_task()
    show_fake_shell_task()

# Set the FAKE_SHELL env var before runtime
os.environ["FAKE_SHELL"] = "my-fake-shell"
flow.run()
Generally, you should never call
.run()
on a task while defining your flow. Task runs are meant to be deferred and if you run them beforehand then it'll be confusing when they contain the wrong values. Best practice for subclass-style tasks using secrets is to pass the name of the secret to the task init then retrieve the secret at runtime.
f

Felipe Saldana

03/05/2021, 4:14 PM
Thanks for the response. Ok, that makes sense to pass the name into the custom task. So my two options are to call the .run() on the EnvVarSecret or simply grab it directly using os.environ.get. Which one would you go with?
Copy code
def run(self):
        # You can use the EnvVarSecret here
        shell = EnvVarSecret(self.secret_name).run()
        # but really there's no reason to not just pull it from the env 
        # shell = os.environ.get(self.secret_name)
        print("Shh, it's secret. This is my shell: {}".format(shell))
z

Zanie

03/05/2021, 4:18 PM
The main use for
EnvVarSecret
is to automatically convert an environment value to a secret as in the non-subclass
@task
way that I showed first. If you're not using it like that, it makes more sense to just use
os.environ.get(...)
in my opinon.
👍 1
f

Felipe Saldana

03/05/2021, 4:22 PM
I appreciate it @Zanie
a

Adam

03/09/2021, 7:02 PM
@Zanie this is a super interesting discussion. We’ve created a little custom Task to reduce the boilerplate when querying Postgres. We don’t really want every task to have to
EnvVarSecret
all the credentials and pass them in to
PostgresFetch
(we also prefer the NamedTupleCursor) so we created the task below. Should we rather use
os.environ.get
instead of using
EnvVarSecret
as we’re doing below:
Copy code
import psycopg2 as pg
from prefect import Task
from prefect.tasks.secrets import EnvVarSecret
from prefect.utilities.tasks import defaults_from_attrs
from psycopg2.extras import NamedTupleCursor

from sable_batch.utils.sql import read_sql


class PostgresQuery(Task):
    def __init__(self, query: str = None, sql_file: str = None, **kwargs):
        self.query = query
        self.sql_file = sql_file
        super().__init__(**kwargs)

    def run(
        self,
    ):

        if not self.query and not self.sql_file:
            raise ValueError("A query string or path must be provided")

        if not self.query and self.sql_file:
            self.query = read_sql(self.sql_file)

        pg_user = EnvVarSecret("POSTGRES_USER").run()
        pg_password = EnvVarSecret("POSTGRES_PASSWORD").run()
        pg_host = EnvVarSecret("POSTGRES_HOST").run()

        conn = pg.connect(
            dbname="xxx",
            user=pg_user,
            password=pg_password,
            host=pg_host,
            port=5432,
        )
        try:
            with conn, conn.cursor(cursor_factory=NamedTupleCursor) as cursor:
                cursor.execute(query=self.query)
                records = cursor.fetchall()
                return records
        finally:
            conn.close()
z

Zanie

03/09/2021, 7:21 PM
Hey @Adam, they're basically equivalent but I'd argue it's more clear to just pull from the environment directly. It's less brittle since you're not relying on that task's behavior. That task exists to simplify the passing of variables into other tasks and if you're not passing them in your flow then it's simplest to just use
os.environ
instead of using a Prefect utility that's designed to be used differently.
a

Adam

03/09/2021, 7:27 PM
Prefect, thanks! Makes sense 🙂
3 Views