07/23/2020, 9:32 AM
Hey guys, I'd like to read millions of database records using Dask's read_sql_table function (which works using Dask alone) within prefect, while still partitioning my data in n partitions, computing them in parallel and merge them altogether, in the end. Do you guys have any best approach/practices for Dask-specific functionalities within Prefect? How would the procedure be after my method*:*
Copy code
df = read_sql_table(table='peanuts', uri=connection,
                                index_col="peanut_id", columns=["peanut_details, peanut_date"],
Should I return these partitions and do the delaying and computing using a Prefect's map() from within the flow's scope definition? Thanks in advance! 🙂 And have a great weekend!

Chris White

07/23/2020, 3:30 PM
Hi @bruno.corucho - we don’t recommend returning dask-aware objects from Prefect tasks at this time. Check out this thread for some more info: