Just created a helper decorator to lazily material...
# show-and-tell
a
Just created a helper decorator to lazily materialize an asset based on the task input/output value. The usage is very similar as the original
materialize
decorator, here's an example:
Copy code
@lazy_materialize('{{out_dir}}', asset_deps = ['{{data_dir}}/raw'], output_as = 'out_dir')
def preprocess_data(data_dir: str, preproc_folder: str):
    print(f'Using data from "{data_dir}/raw"')
    return f'{data_dir}/{preproc_folder}'

@flow(name = 'Dummy Test')
def main():
    data_dir = preprocess_data('data', 'processed')
    # Other task...
In the example above, the task output result is simply aliased as
out_dir
and saved as an asset. The
data_dir
asset dependency is also similar, but taken from the input argument instead. While it can be used like above, my main motivation for creating this decorator is to simply monkey patch all my codes and make it run on Prefect without changing anything (not even writing a custom flow). An example of my actual flow (ETL for ML training):
Copy code
from prefect import task, flow
from my_package.utils.prefect import lazy_materialize as lazym
# My original code somewhere on a different file
import my_package.preprocess as p


# Monkey patch all functions that will be called by the main function
# All {{names}} here are part of the task input argument, or the task output
p.pull_data = lazym('{{local_dir}}', asset_deps = ['{{remote_dir}}'], _f = p.pull_data)
p.preproc_data = lazym('{{out_dir}}', _f = p.preproc_data)
p.purge_remote_data = task(p.purge_remote_data) # Don't commit yet ;)
p.upload_data = task(p.upload_data)
p.commit_data = lazym(
    'lakefs://{{repo_id}}/{{commit_id}}', output_as = 'commit_id', _f = p.commit_data
)

@flow(name = 'Data Preprocessing')
def main():
    # The main function that will call all the patched functions above
    # Due to Python quirks, it must be contained within another function
    # If this is called directly, the patches above won't take effect
    return p.main()
I think this is great for people who need to write orchestration code quickly, but want to keep their original code clean (untouched). Here's the gist link to try: https://gist.github.com/AndhikaWB/7fb2a4450e05a731120dd936fb18d8db Obviously, there are some downsides by patching the code this way (e.g. can't rely on
secret.load
if you want the code to work without Prefect as well). However, I decided to share this because it feels pretty satisfying to use, especially for simple/personal projects.
🙌 1