An Hoang
07/26/2021, 8:58 PMdf.parquet
file:
parquet_result = LocalResult(dir="./test_prefect", serializer = PandasSerializer("parquet")
@task
def test_task(df1, df2):
parquet_result.write(df1, location = "df1.parquet", **context)
parquet_result.write(df2, location = "df2.parquet", **context)
Currently I have to set the location
attribute at the time of instantiating the LocalResult
object. The code below works
parquet_result_partial = partial(LocalResult, dir="./test_prefect", serializer = PandasSerializer("parquet"))
@task
def test_task(df1, df2):
parquet_result_partial(location = "df1.parquet").write(df1, **context)
parquet_result_partial(location = "df1.parquet").write(df2, **context)
So it seems the location
kwargs to Result.write
does not do anything. Is this by design? Or am I missing somethingKevin Kho
@task(result=LocalResult(location="xxx.csv"))
, This would save the return of the task.
With the specific code snippet above, I think it might be easier to use the Pandas interface directly with <http://df.to|df.to>_parquet("xxx.parquet", index=False)
If you want to use the Result class though, I think you can do something like templating the result. In this case, you can map over df1
and df2
and just provide the task_run_id
or map_index
to create a new file path. You can also use the value, so maybe you can pull the name of the dataframe to create the filename. With this, you can do test_task.map([df1,df2])
.
You can also template the first LocalResult
and then maybe fill it with the kwargs later like this .
You can also use the LocalResult
inside the task, but I guess you’re asking because there is a lot of boilerplate code when you use the same result in a lot of different tasks?