https://prefect.io logo
Title
j

John Horn

03/07/2023, 10:45 PM
I am trying to persist a task result to a GCP bucket and utilize that result for 24 hours, else re-create the cache. I'm choosing to use an object store like GCP bucket because I'm having issues testing caching on a docker container then deploying and finding the location of the local storage. So to make things easy for testing and deployment the cache should just be the bucket. when running the tasks back to back in a docker container for testing purposes this happens: This will throw an error:
@task(
  cache_key_fn=task_input_hash,
  cache_expiration=timedelta(days=1),
  persist_result=True,
  result_storage=GCS(
    bucket_path="foo-bucket-unique",
    service_account_info=json.dumps(GcpCredentials.load('foo-gcp-service-account').service_account_info))
  )
error:
Path /root/.prefect/storage/XXXXXXXXXXXXXXXXXXXXXXX does not exist.
This will persist to the bucket but will still run both tasks w/o any utilization of the cache on either run
@task(
  persist_result=True,
  result_storage=GCS(
    bucket_path="foo-bucket-unique",
    service_account_info=json.dumps(GcpCredentials.load('foo-gcp-service-account').service_account_info))
  )
This will persist to the bucket but will still run both tasks w/o any utilization of the cache on either run
@task(
  # cache_key_fn=task_input_hash,
  cache_expiration=timedelta(days=1),
  persist_result=True,
  result_storage=GCS(
    bucket_path="foo-bucket-unique",
    service_account_info=json.dumps(GcpCredentials.load('foo-gcp-service-account').service_account_info))
  )
1
z

Zanie

03/07/2023, 10:48 PM
I think you want the first one, perhaps with a
refresh_cache=True
on the first run?
That errors sounds like the cache key is looking up an old persisted result that doesn’t use GCS
j

John Horn

03/07/2023, 11:01 PM
Genius! It worked!!!
yeah it was storing my previous runs, so I re-ran with refresh_cache=True. Then I took it off and those subsequent runs all used cache