Kyle McChesney
08/02/2021, 8:01 PM@task(result=S3Result('bucket', location='example.out')
def example():
return [1, 2, 3]
Is it just a pickle file that when loaded, it recreated the list of [1, 2, 3]
How does it work for more complicated returns, for example a task that returns a tuple or a pandas DataFrame?Kevin Kho
Serializers
and the default is a JSONSerializer
or PickleSerializer
(not super sure right now). For a pandas DataFrame you would explicit define the PandasSerializer
with your result like S3Result(…, serializer=PandasSerializer())
Kevin Kho
PickleSerializer
. The Serializer
is used for both reading and writing.Kyle McChesney
08/02/2021, 8:12 PM@task(result=S3Result('bucket', location='example.out')
def data(output_url) -> Tuple[pandas.DataFrame, pandas.DataFrame]:
res_path = os.path.join(output_url, 'results.csv')
res_summary_path = os.path.join(output_url, 'summary.csv')
res = pandas.read_csv(res_path)
res_summary = pandas.read_csv(res_summary_path)
return res, res_summary
Output URL is actually an s3 directory” url like <s3://bucket/location/>
Would I need a custom serializer to handle this? I ran this (without a serializer specified) and it seemed to produce a file on s3 which unpickles to the second data frameKevin Kho
@task()
def data():
s3_res = S3Result(...)
s3_res.write(res)
s3_res.write(res_summary)
def data2():
s3_res = S3Result(...)
s3_res.read(res)
s3_res.read(res_summary)
But at this point the easiest way to achieve this is using the native <http://df.to|df.to>_csv
+ s3fs to directly write to the s3 location.Kyle McChesney
08/02/2021, 8:34 PMKevin Kho
@task()
def data():
s3_res = S3Result(...)
s3_res.write(res)
s3_res.write(res_summary)
return s3_res.location
or in your case both locations.Kyle McChesney
08/02/2021, 8:38 PM