https://prefect.io logo
Title
c

Charles Leung

03/16/2021, 3:02 PM
Hey Team! i have some general questions about secrets and script storage. When i use Gitlab Storage to register flows, it seems that they still look for secrets i plan to add onto the agent during execution; yet an error is thrown when i'm just registering the flow:
ValueError: Local Secret "VAULT_TOKEN" was not found.
Is this the expected behavior? should all secrets be registered wherever flows are created/registered?
z

Zanie

03/16/2021, 3:04 PM
Could you show how you're adding the secrets?
c

Charles Leung

03/16/2021, 3:06 PM
from prefect.client import Secret

Secret("VAULT_TOKEN").get()
Using the old secret method
z

Zanie

03/16/2021, 3:06 PM
So you won't want to call
get()
on the secret manually, that's where it tries to get the value
If you pass the secret to a task, Prefect will resolve it to the value for you
c

Charles Leung

03/16/2021, 3:07 PM
oh i see, is there an idiomatic way to do this without writing it in as an argument?
I find it a bit cumbersome to create a task to access vault and return a configuration each time
z

Zanie

03/16/2021, 3:08 PM
You can pass it to the storage object ie
flow.storage = Github(..., secrets=["VAULT_TOKEN"])
Then within a task you could call
PrefectSecret("VAULT_TOKEN").get()
I believe
c

Charles Leung

03/16/2021, 3:09 PM
i see, would i then access my secret in the same manner?
OK i'll give that a shot 🙂 thanks
does it have to be within a task?
e.g., i want to use the extracted configuration as a global for my script
z

Zanie

03/16/2021, 3:13 PM
Anything that's not in a task is executed immediately instead of being deferred to flow runtime, hence the typical pattern of generating a config in a task then passing it around.
(so no, you can't have it at the top level unless you also want to set it in your
config.toml
during development)
c

Charles Leung

03/16/2021, 3:15 PM
curiously, how are globals handled in prefect? i thought the entire script would be executed in script-based storage rather than all globals/locals pickled
z

Zanie

03/16/2021, 3:17 PM
Well when you call your script to register your flow, the
Secret.get()
code will be called and if it errors you won't reach registration?
You can pass
stored_as_script
to storage and the entire script is executed rather than being pickled
c

Charles Leung

03/16/2021, 4:01 PM
Hey Michael! i tried the paradigm earlier where to include the secrets parameter in storage, still i got the same error unfortunately -
i guess maybe there needs to be clarification in the "using secrets elsewhere" portion; so secrets must be in tasks or the secret is set in the environment where the flow is being registered
if i have sensitive globals, how are those stored for the flow during runtime?
z

Zanie

03/16/2021, 5:01 PM
Can you explain what you mean by sensitive globals and what you mean by stored? Global variables in Python are in-memory. Prefect doesn't know anything about your global variables, they're just handled by Python.
c

Charles Leung

03/16/2021, 5:04 PM
For example, if i fetch user/pass configurations from vault from my environment, then use them as globals in my script, then store it in gitlab, how does the flow know those values? I'll paste a code sample below
# Get Job Credentials from Vault
vault = hvac.Client(url='<URL>', verify=f'<cert>')
aws_config = vault.read('<path>').get('data')
smb_config = vault.read('<path>').get('data')
smb_config.update(dict(
    my_name=socket.getfqdn(),
    use_ntlm_v2=True
))

@prefect.task
def upload_smb_files():
    # Get File Difference from SMB to S3
    # Setup
    s3_client = boto3.client('s3', **aws_config)
    smb_connection = SMBConnection(
        remote_name='<remote>',
        **smb_config
    )
    smb_connection.connect('')

    try:
        # move files with config

    finally:
        smb_connection.close()
    return len(files)

flow.run_config = DockerRun(
    image='<image>'
)

flow.storage = GitLab(
    host="<gitlab host>",
    repo="<repo path>",                            # name of repo
    path="<path in repo>",                    # location of flow file in repo
    access_token_secret="GITLAB_ACCESS_TOKEN",   # name of personal access token secret
)
when i execute prefect register on cli, it then executes the code to get the configurations; how are these configurations then passed to the functions, since they are executed at runtime in a different environment?
z

Zanie

03/16/2021, 5:20 PM
We don't store these values in the API so the behavior would likely depend on the type of storage you use (script storage would attempt to load those values in your new environment). You can easily inspect the serialized data with
from prefect import Flow, task

x = "FOO"

@task(log_stdout=True)
def display(value):
    print(value)

with Flow("constant-global") as flow:
    display(x)

serialized_flow = flow.serialize()

from pprint import pprint
pprint(serialized_flow)
❯ python example-constant-global.py
OrderedDict([('name', 'constant-global'),
             ('type', 'prefect.core.flow.Flow'),
             ('schedule', None),
             ('parameters', []),
             ('tasks',
              [{'__version__': '0.14.11+10.gb0a47a530.dirty',
                'auto_generated': False,
                'cache_for': None,
                'cache_key': None,
                'cache_validator': {'fn': 'prefect.engine.cache_validators.never_use',
                                    'kwargs': {}},
                'inputs': {'value': {'required': True, 'type': 'typing.Any'}},
                'max_retries': 0,
                'name': 'display',
                'outputs': 'typing.Any',
                'retry_delay': None,
                'skip_on_upstream_skip': True,
                'slug': 'display-1',
                'tags': [],
                'timeout': None,
                'trigger': {'fn': 'prefect.triggers.all_successful',
                            'kwargs': {}},
                'type': 'prefect.tasks.core.function.FunctionTask'}]),
             ('edges', []),
             ('reference_tasks', []),
             ('environment', None),
             ('run_config', None),
             ('__version__', '0.14.11+10.gb0a47a530.dirty'),
             ('storage', None)])
When the flow is imported to be run, those values will be resolved and passed.
Doing something like
from prefect import Flow, task, Parameter

x = "FOO"

@task(log_stdout=True)
def display(value):
    print(value)

with Flow("constant-global") as flow:
    x_param = Parameter("x", default=x)
    display(x_param)

serialized_flow = flow.serialize()

from pprint import pprint
pprint(serialized_flow)
would store your value in the Prefect API
('parameters',
              [{'__version__': '0.14.11+10.gb0a47a530.dirty',
                'default': 'FOO',
                'name': 'x',
                'outputs': 'typing.Any',
                'required': False,
                'slug': 'x',
                'tags': [],
                'type': 'prefect.core.parameter.Parameter'}]),
c

Charles Leung

03/16/2021, 5:24 PM
got it, thank you!