Just ran into something I wasn’t expecting I wante...
# prefect-community
k
Just ran into something I wasn’t expecting I wanted to pass along. I’m in an Azure environment running Prefect on
AKS
. I’ve been using
Docker
storage for my flows and just recently started using
AzureResult
in my flows for result persistence. I’ve been letting the
AzureResult
use the
AZURE_STORAGE_CONNECTION_STRING
environment variable which is the recommendation from the documentation and plays nicely with
k8s
secrets in
AKS
. Just found after some infrastructure changes that the environment variable was not being used and finally found it it was because the
Docker
storage was capturing the environment variable during registration and serializing the value via
cloudpickle
into the flow. I’ll be looking into using secrets; however, I wanted to suggest maybe re-visiting the documentation or process to prevent accidental credential exposures in the
Docker
storage.
k
Could you show me what this looked like in code?
k
Sure thing, so you may recall a bit back I was workin on the location stuff on the AzureResult. I made a small helper method I use for my flows to set the result at the
Flow
level.
Copy code
def flow_based_azure_result(
    project_name: str,
    flow_name: str,
    container_name: str = "prefect-results",
    **kwargs: Any,
) -> AzureResult:
    """Create an AzureResult for a flow."""
    location_formatter = partial(
        azure_result_location_formatter, project_name, flow_name
    )
    return AzureResult(container_name, location=location_formatter, **kwargs)
I use this when constructing the flow:
Copy code
with Flow(
    name=FLOW_NAME,
    schedule=schedule,
    executor=LocalDaskExecutor(),
    result=flow_based_azure_result(PROJECT_NAME, FLOW_NAME),
) as flow:
k
I think I know what you are saying. This is written differently than the AWS and GCS results
k
Yea, I expected the retrieval of the environment variable to be deferred to the
initialize_service
method like the secrets management is. I see that it falls back to
DefaultAzureCredential
. I didn’t see this documented but eventually I would like to get all my pods running with a managed identity and/or service principal for Azure stuff.
This might also cause some issues if using
Azure
storage (I’m not) since it instantiates
AzureResult
internally: https://github.com/PrefectHQ/prefect/blob/0ba43607572cfbd7e88050e28f92377eab09e441/src/prefect/storage/azure.py#L59
k
Yeah I’ll chat with someone and likely write an issue. Thanks for mentioning
👍 1
k
we’ve experienced a similar issue, and my general understanding is in some flow storage/registration configurations, any/all environment variables used will be evaluated/set at resignation time, vs flow execution time. A common pattern we’ve established is doing something like
Copy code
def get_some_environment_variable():
   return os.environ['SOME_ENV_VAR']
instead of doing something like this (at the “top level” of the flow where its evaluated at import time):
Copy code
some_environment_variable = os.environ['SOME_ENV_VAR']
k
@Kyle McChesney - Yeah, I think it helps to look at it from the perspective that at the top level you are building a flow, not running a flow. All environment variables that shouldn’t be “compiled” into the flow should be called from tasks.
k
You can chime in here. Thanks for investigating! I just formalized the write up but this is all you
👍 1