Hi, I’m having trouble using cache with tasks that...
# prefect-community
v
Hi, I’m having trouble using cache with tasks that take blocks as parameters. It seems that the
task_input_hash
hashes the block differently on each flow run, and therefore causing cache misses. I’ll attach code snippets needed to reproduce the issue in this thread.
1
Define and save a custom block for API url/credentials:
Copy code
# blocks/api_block.py

from pydantic import SecretStr

from prefect.blocks.core import Block


class ApiBlock(Block):
    base_url: str
    path: str
    api_key: SecretStr


test_block = ApiBlock(base_url="<http://www.example.com|www.example.com>", path="data", api_key="secret_key")
test_block.save("test-block", overwrite=True)
Flow definition with two almost identical tasks, one takes a block as parameters and the other takes strings:
Copy code
# sample_flow.py

from datetime import timedelta
from pydantic import SecretStr

from prefect import flow, task, get_run_logger
from prefect.tasks import task_input_hash

from blocks.api_block import ApiBlock


@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(minutes=10))
def call_api_with_block(api_conf: ApiBlock):
    logger = get_run_logger()
    <http://logger.info|logger.info>(f"Getting data from {api_conf.base_url}/{api_conf.path}, with key: {api_conf.api_key}")
    return "some data...."


@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(minutes=10))
def call_api_with_strings(base_url: str, path: str, api_key: SecretStr):
    logger = get_run_logger()
    <http://logger.info|logger.info>(f"Getting data from {base_url}/{path}, with key: {api_key}")
    return "some data...."


@flow
def get_data_from_api():
    test_block = ApiBlock.load("test-block")

    call_api_with_block(test_block)
    call_api_with_strings(test_block.base_url, test_block.path, test_block.api_key)


get_data_from_api()
Running the
get_data_from_api
flow multiple times shows that
call_api_with_block
is executed with every run, while
call_api_with_strings
is cached correctly. Wrapping the
task_input_hash
with a custom function and logging the cache key shows that the key is different on every run:
Copy code
def cache_key_logger(context, params):
    logger = get_run_logger()
    cache_key = task_input_hash(context, params)
    <http://logger.info|logger.info>(cache_key)
    return cache_key
I’m using Prefect 2.4.2
k
Thanks for raising this issue. This seems to be a bug. Do you mind opening the issue on Prefect GitHub so we can take a closer look at this?
v
Sure, I’ll write up an issue!