Can I/Is it a good practice to put a result of a t...
# ask-community
a
Can I/Is it a good practice to put a result of a task inside the
prefect.context
to be used in all subsequent tasks? I have a stateful variable that many tasks will use and would love to use it inside many tasks without passing in as parameter
1
k
Hey @An Hoang, it is not good practice because the context is not mutable in a sense that I think subsequent tasks won’t have it. Does it from your tests? It’s not something we recommend. It’s likely better to pass it as a Parameter, or you can use the KV Store if it’s below 10 KB.
a
I would love to pass it as Parameter, but the object is dynamically created from a string parameter Example inside a flow context:
Copy code
config_path = Parameter() #string param
data_catalog_object = get_data_catalog(config_path) #custom-class object initialized from parameters parsed from the `config_path`

task1_result = task1(data_catalog_object, *args), #data_catalog_object's state might be modified here

#many more tasks using the `data_catalog_object` as input
It is rarely below 10kb so I don't think KV Store would work
k
Once it’s instantiated with
data_catalog_object
, you can pass that to downstream tasks…unless it’s not serializable?
a
Yes, it is serializable, but 90% of my tasks use this
data_catalog_object
and modifies its state, so I wondered if there is an easy way to not have to pass it to the tasks every single time. It's also a hassle to have to return the modified
data_catalog_object
in every single task, as
target
caching will not work as intended
k
I’ll ask the team if there are any ideas. Not seeing any at the moment
So when you mutate that
data_catalog_object
, it’s needs to be explicitly returned because Prefect won’t keep that of the state for operations like restarting the Flow Run from the point of failure. I’ll try to figure out the target though for multiple returns.
a
Thank you. What if the
data_catalog_object
is not mutated, just used many times in many tasks. Does that change anything?
k
Yes and no. If it was not instantiated in a Parameter, you might be able to use it as a global with script-based storage. But given it’s created in runtime through a Parameter, I think it’s not doable to get it as a global. But! what you can also maybe do is create your modified
@task_with_catalog
that supplies the data_catalog so you don’t have to worry about it? You would still have to input it inside the
Flow
, but at least it doesn’t affect the task definition?
Unfortunately it looks like we don’t support the multiple targets. What is your
data_catalog_object
? I think the KV store is really something that will help here if possible.
a
the
data_catalog_object
is Kedro's DataCatalog. I want to separate out the data loading/saving part from Prefect code and let the data catalog handle all of that. So when I want to save
cars.csv
, assume that I have the yaml below at
config/path/folder/catalog.yaml
Copy code
cars:
  type: pandas.CSVDataSet
  filepath: data/01_raw/company/cars.csv
  load_args:
    sep: ','
  save_args:
    index: False
    date_format: '%Y-%m-%d %H:%M'
    decimal: .
I can do
Copy code
from <http://kedro.io|kedro.io> import DataCatalog
cars_df = ...

data_catalog = DataCatalog("config/path/folder")
data_catalog.save("cars", cars_df) #saves cars_df to data/01_raw/company/cars.csv
cars_df = data_catalog.load("cars")
I just use
with Flow('flow', result = LocalResult("path/to/outer-most/folder")
and use this catalog to handle the loading and saving of sub-files/folders
One thing that would be very helpful is if
target
argument accepts a function that returns
True
or
False
, then I can write the function to check multiple outputs with complex logic
k
Oh I see, I watched the Kedro tutorial today after you mentioned it (and I’ve been meaning to for a while. They presented right before me in PyCon.). In this case I think there are two options here. First is to store the path to the
DataCatalog
you want to use in the Flow and then load it during the Flow run. If you store your flow as a script, it will be loaded and run at runtime. The second thing to use the
KV Store
to point to the address of the DataCatalog and then load it in per task. You can also maybe mutate it and save it and then load it downstream again. You can create a helper function (non-task) that loads this in, and then your flows can use it. If you make your own decorator like
@mytask
to handle this, the function just has to take in
kwargs
for you to be able to pass in the configuration. The new decorator might be able to take care of this for you.
a
Thanks @Kevin Kho! I'll try it out and let you know how it goes.
store the path to the 
DataCatalog
 you want to use in the Flow and then load it during the Flow run
There can be only one fixed path stored per flow right? I need the path to be parameterized by the user Also, what do you think about this? Would it be feasible to add in the near future?
One thing that would be very helpful is if 
target
 argument accepts a function that returns 
True
 or 
False
 , then I can write the function to check multiple outputs with complex logic
k
I think it’s still not easy for the user because
target
s still work in conjunction with serializers and if you have two different
targets
, you would still need to make your own custom serializer to handle different types. Actually, the way it works right now is that your two returns will come in as a tuple to the
Serializer
, so you can actually make you own serializer to provide custom logic to handle the tuple. If you need to Parameterize it, then the best approach is really to parameterize the path and then create
DataCatalog
and pass it throughout the flow. I think by design this just becomes really if hard if you mutate it inside the tasks, the best design I think is to mutate them in their own tasks and pass them around like that.