< Marvin> What s the best way to attach some data to a flow Prefect Community #ask-marvin

<@ULVA73B9P> What's the best way to attach some da...

Adam Lear

09/03/2025, 1:57 PM

@Marvin What's the best way to attach some data to a flow run at runtime? I would like to add some value to the flow run so I can access it after the flow run has ended.

Marvin

09/03/2025, 1:57 PM

✅ thought for 379.5 seconds

Marvin

09/03/2025, 2:03 PM

Great question. In Prefect 3.x, the best way to attach data to a flow run at runtime (so it’s available after the run completes) is to create an artifact associated with that run. Why artifacts? - Purpose-built for attaching outputs/metadata to runs - Visible in the Prefect UI on the run page - Easy to retrieve later via API or CLI - Support keys so you can fetch “the latest” by key across runs Common patterns - Small/structured data: use a markdown or table artifact - Links to external outputs (S3, GCS, dashboards): use a link artifact - Progress or numeric status: use a progress artifact - For large data, store it externally and attach a link artifact to avoid bloating Prefect’s DB Example: attach JSON-like metadata as a markdown artifact

Copy code

from prefect import flow
from prefect.artifacts import create_markdown_artifact
import json

@flow
def my_flow():
    # Whatever you want to persist
    metadata = {
        "rows_loaded": 12345,
        "output_uri": "<s3://my-bucket/data/run_2025_09_03.parquet>",
        "notes": "Backfilled partition=2025-09-03"
    }

    create_markdown_artifact(
        markdown="Run metadata:\n```json\n" + json.dumps(metadata, indent=2) + "\n```",
        key="my_flow/run-metadata",
        description="Per-run metadata for my_flow"
    )

Example: attach a link to an external artifact (e.g., S3/GCS/DBT docs)

Copy code

from prefect.artifacts import create_link_artifact

create_link_artifact(
    link="<s3://my-bucket/data/run_2025_09_03.parquet>",
    link_text="Output dataset",
    key="my_flow/output"
)

Example: attach a small table/dict

Copy code

from prefect.artifacts import create_table_artifact

create_table_artifact(
    table=[{"table": "events", "rows_loaded": 12345, "status": "ok"}],
    key="my_flow/load-stats",
    description="Load stats per run"
)

Retrieving artifacts later - CLI

Copy code

prefect artifact ls
prefect artifact inspect "my_flow/run-metadata"

- Python API (filter by key or by flow_run_id)

Copy code

import asyncio
from prefect.client.orchestration import get_client
from prefect.client.schemas.filters import ArtifactFilter, ArtifactFilterKey, ArtifactFilterFlowRunId

async def fetch_latest_by_key():
    async with get_client() as client:
        latest = await client.read_latest_artifacts(artifact_keys=["my_flow/run-metadata"])
        for a in latest:
            print(a.key, a.type, a.data)

async def fetch_all_for_run(run_id: str):
    async with get_client() as client:
        arts = await client.read_artifacts(
            artifact_filter=ArtifactFilter(
                flow_run_id=ArtifactFilterFlowRunId(any_=[run_id])
            )
        )
        for a in arts:
            print(a.key, a.type, a.data)

asyncio.run(fetch_latest_by_key())

Alternatives (use sparingly) - Update flow run attributes via the API (e.g., name, tags) if you only need small labels/strings:

Copy code

from prefect.runtime import flow_run
from prefect.client.orchestration import get_client
from prefect.client.schemas.actions import FlowRunUpdate
import asyncio

async def tag_run():
    async with get_client() as client:
        await client.update_flow_run(
            flow_run_id=flow_run.id,
            flow_run=FlowRunUpdate(tags=["output:s3", "backfill"])
        )

asyncio.run(tag_run())

Note: tags are great for filtering/searching, but not for storing rich data. There is no generic “metadata” field on flow runs in 3.x, and mutating parameters mid-run is generally discouraged. Docs and references - Artifacts CLI: Artifacts CLI (prefect artifact) - API Client: Using the Prefect API client - Flow Runs API reference: Flow runs API Would you like help picking the right artifact type for your data (e.g., a JSON blob vs. a link), and how you plan to retrieve it (UI, API, or CLI)?

6 Views

Open in Slack

Previous Next