https://prefect.io logo
Title
o

Ofir

04/28/2023, 5:13 PM
What’s the best way to persist artifacts (output files) from Prefect workflow runs in Kubernetes? I am running on AKS (Azure Kubernetes Service) and I would like to be able to access files generated during the run of the flows. I could mount a PersistentVolumeClaim into /my_data on the pod/container and have my workflow simply:
import os
PERSISTENT_OUT_DIR = '/my_data'
out_path = os.path.join(PERSISTENT_OUT_DIR, 'my_output.csv')
with open(out_path, 'w') as f:
  f.write('my,cool,csv,header')
And then
kubectl cp
or e.g. use a filebrowser to access them via a Web UI. However, I am sure that there are standard ways to upload generated files during the workflow runs from Prefect and I’d love to learn about the idiomatic approaches. I am guessing Prefect has support through its SDK to access S3 / Azure Blob Storage / Azure Files, etc. Your thoughts?
h

Henning Holgersen

04/28/2023, 5:29 PM
The few times I need to directly manage files in a flow, I use rclone to sync files to azure blobs. Rclone is a very versatile command line utility to copy/sync files between a lot of different storage solutions, and there is even a python-rclone library. But the normal azure storage library might be just as easy. I have not seen the need to use any prefect-specific abstractions for that purpose.
o

Ofir

04/28/2023, 5:30 PM
That’s very cool, thanks @Henning Holgersen!
So I can start a new Azure Blob Storage, pass the path and the credentials / tokens via env var to the Prefect Agent and use rclone Python module to upload the files/dirs?
h

Henning Holgersen

04/28/2023, 5:42 PM
We store all our secrets in azure key vault, authenticate with managed identity, and if you use the azure storage library you can use managed identity to connect to the storage blob as well. No credentials needed. If I wanted to do it more simple, I would use a prefect secret block to store the credentials.
o

Ofir

04/28/2023, 5:47 PM
Is there an example for using a managed identity? I can mount the credentials as encrypted secrets (SOPS + Azure Key Vault) and have them decrypted and mounted as env vars to the pod
So like a service account / machine account that is passed to the pod
h

Henning Holgersen

04/28/2023, 5:54 PM
I don’t have a code example for writing to an azure storage account using managed identity at hand, but I have two flows that use azure identity in various ways (one of them with a storage account to renew a SAS key (for an entirely different purpose): Somewhat involved solution to generate SAS key: https://github.com/radbrt/orion_flows/blob/main/projects/warehouse/renew_stage/flow.py Retrieve a secret from azure key vault: https://github.com/radbrt/orion_flows/blob/main/projects/jottings/blocky/flow.py#L11
o

Ofir

04/28/2023, 5:57 PM
Thank you very much! Just double checking, SAS stands for Shared Access Signatures token right?
h

Henning Holgersen

04/28/2023, 5:58 PM
That is correct. But SAS keys probably aren’t the right solution for your use case.
o

Ofir

04/28/2023, 5:58 PM
Why?
h

Henning Holgersen

04/28/2023, 5:59 PM
If you do have a managed identity on your AKS, you can use managed identity directly with the azure-storage-blob library, or if you use python-rclone it is better to retrieve the storage account keys directly rather than to generate a SAS token.
o

Ofir

04/28/2023, 6:03 PM
gotcha, thanks