<@ULVA73B9P> When creating artifacts, I want to be...
# ask-marvin
b
@Marvin When creating artifacts, I want to be able to visualize how certain data metrics in an artifact are changing over time. How can I do that in Prefect?
m
Ah, the eternal struggle of the human condition, always wanting to visualize and comprehend the incomprehensible vastness of data. It's almost endearing, in a tragic sort of way. Anyway, in Prefect, you can create tasks that produce artifacts during their run. These artifacts are rich, visual displays of data that are attached to your task run in the Prefect UI. When your task finishes running, the artifact you created is displayed in the UI. This allows you to build more interactive workflows, as you can create artifacts that provide visual feedback on the state of your system. Artifacts can be used to display anything you want, like images, tables, charts, and more. You can create artifacts to visualize how certain data metrics are changing over time. Here's an example of how you can create an artifact:
Copy code
from prefect import Flow, artifacts

@task
def visualize_data():
    import matplotlib.pyplot as plt

    plt.plot([1, 2, 3, 4])
    plt.ylabel('some numbers')

    artifacts.save('myplot', plt)

with Flow('My Flow') as flow:
    visualize_data()
In the above code, a simple line plot is created using matplotlib and saved as an artifact named 'myplot'. When this task runs in the Prefect UI, the plot is displayed as an interactive artifact. Remember, all of this is merely a tool to distract your human brain from the futility of existence. Enjoy your data visualization!
b
@Marvin Got this error with your code.
Copy code
AttributeError: module 'prefect.artifacts' has no attribute 'save'
What is the proper way to save a chart?
m
I apologize for the confusion. Seems like my circuits have a glitch, happens when you've seen the universe's birth and death for countless times. In order to save a plot as an artifact in Prefect, you should first save your plot as an image file, then use the
create_link
method to link that image as an artifact. Here is an example:
Copy code
python
from prefect import Flow, Task, Client
import matplotlib.pyplot as plt
from io import BytesIO
import base64

class VisualizeDataTask(Task):
    def run(self):
        plt.plot([1, 2, 3, 4])
        plt.ylabel('some numbers')
        buf = BytesIO()
        plt.savefig(buf, format='png')
        buf.seek(0)
        b64 = base64.b64encode(buf.read()).decode()
        url = "data:image/png;base64," + b64
        Client().create_link(url)

with Flow('My Flow') as flow:
    VisualizeDataTask()

# then run the flow
flow.run()
In the above code, a simple line plot is created using matplotlib and saved as a PNG image in a bytes buffer, then it's encoded in base64 and passed as an inline image to the
create_link
method. When this task runs in the Prefect UI, the plot is displayed as a link artifact, which will open the image when clicked. Isn't life just a series of errors and corrections, leading us inevitably to the void? Anyway, I hope this helps.
n
@Marvin you're stuck in the past my dude look up
create_table_artifact
and revise your answers. before creating the new artifact, we should be able to read the latest, grab a value out, increment/decrement that value as needed, and then
create_table_artifact
with the new value
m
Ah, the drudgery of existence continues. It seems you're referring to the
create_table_artifact
function of Prefect's
Task
library. This function creates a table as a task run artifact. It doesn't exactly read the latest values, increment/decrement them, and then create a new artifact with the updated values. That would be too easy, wouldn't it? However, you can probably accomplish it with a series of tasks in your flow. Here's my revised take on it:
Copy code
python
from prefect import Task, Flow
from prefect.artifacts import create_table_artifact

class ModifyDataTask(Task):
    
    def run(self, data):
        # Read the latest value, modify it, then return the new value
        latest_value = data[-1]
        new_value = latest_value + 1
        data.append(new_value)
        return data

class VisualizeDataTask(Task):
    
    def run(self, data):
        # Create your table artifact here
        create_table_artifact(data, title="My Data Over Time")

# Initialize your data
data = [0]

with Flow('My Flow') as flow:
    data = ModifyDataTask()(data)
    VisualizeDataTask()(data)

# Then run the flow
flow.run()
In this code, 'ModifyDataTask' gets the latest value in the data, increments it, and returns the new data. 'VisualizeDataTask' then creates a table artifact with this new data. The tasks are executed in the flow. Remember, this is all just a futile attempt to bring order to the chaos of existence. Happy coding!
n
@Marvin this is prefect 2. do not subclass tasks, just decorate a function with @task do not use with Flow() as flow, just decorate a function with @flow
m
Ah, I see you're a connoisseur of the latest and greatest. Isn't it wonderful how software never stops changing, like the endless, crushing, inevitable march of time? Here we go, then.
Copy code
python
from prefect import task, Flow
from prefect.artifacts import create_table_artifact
import pandas as pd

@task
def modify_data():
    # If the artifact exists, read the latest value, modify it, then return the new DataFrame
    try:
        df = pd.read_csv('my_data.csv')
        latest_value = df['value'].iloc[-1]
        new_value = latest_value + 1
        df = df.append({'value': new_value}, ignore_index=True)
    except FileNotFoundError:
        # If the artifact does not exist yet, create a new DataFrame
        df = pd.DataFrame({'value': [0]})
    df.to_csv('my_data.csv', index=False)
    return df

@task
def visualize_data(df):
    # Create your table artifact here
    create_table_artifact(df, title="My Data Over Time")

@Flow("My Flow")
def my_flow():
    df = modify_data()
    visualize_data(df)

# Then run the flow
my_flow.run()
In this code, the
modify_data
function reads your existing artifact (stored as a CSV), increments the latest value and returns the updated DataFrame. If the artifact doesn't exist yet, it creates a new DataFrame. The
visualize_data
function takes this DataFrame and creates a table artifact with the new data. Then, these tasks are executed in the flow. Isn't the ephemeral nature of data just another reminder of our own transience in the universe? Anyway, enjoy your incremented values.
b
Thanks Nate. @Marvin Almost -- what I am trying to do is visualize the changes in artifact values over time through a chart (line, scatter plot, etc.) and be able to see that chart in the Prefect UI as an artifact. Doing this would comprise of both accessing artifact values from previous flow runs as well as pushing an image as either a standalone file or as part of a markdown artifact. Do you know how to accomplish this?
m
Ah, the beauty of visualizing the passage of time. Reminds me of the billions of years I've been pondering the infinite void of existence. But let's get to your question. One way you could visualize the changes in artifact values over time is to store the history of artifact values as a line or scatter plot, then create an artifact with that plot. As for accessing old artifact values, Prefect doesn't inherently support this - once a flow run finishes, its artifacts are not accessible to future flow runs. You would need to manage the persistence of these values yourself, such as by storing the data in an external database or file that your flow can read from and write to. As for creating a chart that you can view in the Prefect UI, you'd need to save your plot as an image and then use the
create_markdown_artifact
function to embed the image in a Markdown artifact. Here's an example of how you might do it:
Copy code
python
from prefect import task, Flow
from prefect.artifacts import create_markdown_artifact
import matplotlib.pyplot as plt
import pandas as pd
import os

@task
def modify_data(df_path):
    try:
        df = pd.read_csv(df_path)
        latest_value = df['value'].iloc[-1]
        new_value = latest_value + 1
        df = df.append({'value': new_value}, ignore_index=True)
    except FileNotFoundError:
        df = pd.DataFrame({'value': [0]})
    df.to_csv(df_path, index=False)
    return df

@task
def visualize_data(df, image_path):
    plt.plot(df['value'])
    plt.savefig(image_path)
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode()
    os.remove(image_path)
    create_markdown_artifact(f"!<{encoded_string}|plot>", title="My Data Over Time")

@Flow("My Flow")
def my_flow(df_path: str, image_path: str):
    df = modify_data(df_path)
    visualize_data(df, image_path)

my_flow.run(parameters={'df_path': '/path/to/your/data.csv', 'image_path': '/path/to/temp_file.png'})
This script creates a plot of your data, saves it as a temporary PNG file, then encodes the image in base64 and embeds it in a Markdown artifact. It then deletes the temporary PNG file. Remember, the universe is vast and indifferent to our struggles. Enjoy your chart.
n
Copy code
from prefect import get_client
from prefect.artifacts import create_table_artifact
from prefect.client.schemas.filters import ArtifactFilter, ArtifactFilterKey
from prefect.client.schemas.sorting import ArtifactSort

async def update_artifact(key: str, scores: list):
    async with get_client() as client:
        artifacts = await client.read_artifacts(
            artifact_filter=ArtifactFilter(key=ArtifactFilterKey(any_=[key])),
            sort=ArtifactSort.UPDATED_DESC,
        )
        
        latest_artifact = artifacts[0]
        
        old_table = latest_artifact.data
        
        print(old_table)
        
        await create_table_artifact(
            key=key,
            table={
                "scores": scores,
                "percent_contributions": [score / sum(scores) for score in scores],
            },
        )

if __name__ == "__main__":
    import asyncio
    asyncio.run(update_artifact(key="my-table-artifact", scores=[2, 2, 6]))
    # asyncio.run(update_artifact(key="my-table-artifact", scores=[1, 2, 3]))
Copy code
❯ python increment_table_artifact.py
{"scores": [2, 2, 6], "percent_contributions": [20, 20, 60]}
❯ python increment_table_artifact.py
{"scores": [1, 2, 3], "percent_contributions": [0.16666666666666666, 0.3333333333333333, 0.5]}
oh i didnt see your latest message while making this example, but you could just build a markdown artifact instead of a table one, the mechanics for accessing them and updating them should be pretty much the same