https://prefect.io logo
Title
b

bruno.corucho

07/23/2020, 9:32 AM
Hey guys, I'd like to read millions of database records using Dask's read_sql_table function (which works using Dask alone) within prefect, while still partitioning my data in n partitions, computing them in parallel and merge them altogether, in the end. Do you guys have any best approach/practices for Dask-specific functionalities within Prefect? How would the procedure be after my method*:*
df = read_sql_table(table='peanuts', uri=connection,
                                index_col="peanut_id", columns=["peanut_details, peanut_date"],
                                npartitions=1000)
Should I return these partitions and do the delaying and computing using a Prefect's map() from within the flow's scope definition? Thanks in advance! 🙂 And have a great weekend!
c

Chris White

07/23/2020, 3:30 PM
Hi @bruno.corucho - we don’t recommend returning dask-aware objects from Prefect tasks at this time. Check out this thread for some more info: https://prefect-community.slack.com/archives/CL09KU1K7/p1594664640066200?thread_ts=1594661552.064000&cid=CL09KU1K7