Federico Zambelli
04/20/2023, 1:37 PMclass MyClass:
@task
def doStuffWithArgs(self, myArg: str):
...
def doStuffWithoutArgs(self):
...
@flow
def doMoreStuff(self):
self.doStuffWithArgs("stuff")
self.doStuffWithoutArgs()
You will see that for doStuffWithArgs
Pylance complains that no overload matches that call (see screenshot):flapili
04/20/2023, 1:39 PMFederico Zambelli
04/20/2023, 1:41 PM<http://this.my|this.my>_object
instead of either passing it as an arg of each task or making it globalflapili
04/20/2023, 1:41 PMFederico Zambelli
04/20/2023, 1:41 PMflapili
04/20/2023, 1:42 PMFederico Zambelli
04/20/2023, 1:43 PMflapili
04/20/2023, 1:44 PMFederico Zambelli
04/20/2023, 1:45 PMflapili
04/20/2023, 1:47 PMFederico Zambelli
04/20/2023, 1:48 PMflapili
04/20/2023, 1:48 PMFederico Zambelli
04/20/2023, 1:48 PMtasks should be idempotentthis I'm aware
flapili
04/20/2023, 1:50 PMFederico Zambelli
04/20/2023, 1:53 PMflapili
04/20/2023, 1:54 PMFederico Zambelli
04/20/2023, 1:55 PMflapili
04/20/2023, 1:58 PMFederico Zambelli
04/20/2023, 1:58 PMflapili
04/20/2023, 1:58 PMfrom prefect import task, flow
def get_global_var():
return 42
@task
def some_task():
var = get_global_var()
# do something with var
return var
@flow
def main():
r = some_task()
print(r)
Federico Zambelli
04/20/2023, 1:59 PMflapili
04/20/2023, 1:59 PMFederico Zambelli
04/20/2023, 2:00 PMflapili
04/20/2023, 2:04 PMFederico Zambelli
04/20/2023, 2:07 PMtask(cache_key_fn=task_input_hash, cache_expiration=None)
I assume you mean this?
but I'm wondering if you will not need a storage anywayI'm mostly playing around. The idea behind my last question is as follows: Imagine im reading data from some API that returns unpredictable results, and I'm writing them to S3. I don't want duplicate results so if the
write_to_s3
function receives the same input (e.g. result_key), skip execution.flapili
04/20/2023, 2:22 PMFederico Zambelli
04/20/2023, 2:23 PMflapili
04/20/2023, 2:24 PMFederico Zambelli
04/20/2023, 2:26 PMlike of the api is a function f, f(5) will alway return the same result ?It should in theory, but in practice it doesn't because of unavailability of the API itself. f(5) sometimes can get 1000 results, sometimes can return only 100. And this 100 can be a subset of the 1000. Each result however has a unique key. So imagine I have two tasks:
get_results_from_api
---> write_result_to_storage
For each result in get_result_from_api
, write to storage ONLY if the result_key
hasn't been seen before.cache
with the 2nd task, does it skip writing to storage if the passed result_key
was seen before?flapili
04/20/2023, 2:33 PMFederico Zambelli
04/20/2023, 2:36 PMflapili
04/20/2023, 2:36 PMFederico Zambelli
04/20/2023, 2:37 PMflapili
04/20/2023, 2:37 PMFederico Zambelli
04/20/2023, 2:38 PMflapili
04/20/2023, 2:38 PM