b

    bruno.corucho

    2 years ago
    Hey guys, I'd like to read millions of database records using Dask's read_sql_table function (which works using Dask alone) within prefect, while still partitioning my data in n partitions, computing them in parallel and merge them altogether, in the end. Do you guys have any best approach/practices for Dask-specific functionalities within Prefect? How would the procedure be after my method*😗
    df = read_sql_table(table='peanuts', uri=connection,
                                    index_col="peanut_id", columns=["peanut_details, peanut_date"],
                                    npartitions=1000)
    Should I return these partitions and do the delaying and computing using a Prefect's map() from within the flow's scope definition? Thanks in advance! 🙂 And have a great weekend!
    Chris White

    Chris White

    2 years ago
    Hi @bruno.corucho - we don’t recommend returning dask-aware objects from Prefect tasks at this time. Check out this thread for some more info: https://prefect-community.slack.com/archives/CL09KU1K7/p1594664640066200?thread_ts=1594661552.064000&cid=CL09KU1K7