What can cause latency/delays between tasks? As ca...
# ask-community
g
What can cause latency/delays between tasks? As can be seen below from 10:28 to 10:44 is a 16 minute delay between the end of one task and the start of another. I am working with datasets that use about 2 GB of memory when loaded as a pandas dataframe. Could this affect the latency? This delay only seems to happen when working with larger data.
Copy code
10:28:42.647 | INFO | Flow run 'glorious-woodpecker' - Created task run 'extract_maxmind_databases-0' for task 'extract_maxmind_databases'
10:28:42.648 | INFO | Flow run 'glorious-woodpecker' - Executing 'extract_maxmind_databases-0' immediately...
10:28:45.525 | INFO | Task run 'extract_maxmind_databases-0' - Finished in state Completed()
10:44:11.395 | INFO | Flow run 'glorious-woodpecker' - Created task run 'transform_firsttime_data-0' for task 'transform_firsttime_data'
10:44:11.396 | INFO | Flow run 'glorious-woodpecker' - Executing 'transform_firsttime_data-0' immediately...
2
j
Can you try running your task with the quote annotation? Prefect introspects your task parameters to look for futures,
quote()
says to ignore it
Copy code
from prefect.utilities.annotations import quote

@task
def do_pandas_stuff(df):
   ...

@flow
def my_flow():
   df = get_my_df()
   do_pandas_stuff(quote(df))
j
Is there a performance impact of doing that? I had a similar issue. 9000+ items being mapped over and passing a unmapped dataframe to each task instance. I solved it for me by substantially reducing the dataframe size I needed to pass. Interested in the performance impact of this quote approach later on? is it giving performance up later by not inspecting futures?
j
quote()
basically tells prefect to not look at the parameter. That can give performance gains when the parameter is really big, so prefect doesn't take time to look. It's not free though. you'll lose task result linking for example if prefect can't inspect the parameter (that the result of 1 task was passed to another task, see screenshots) in the dataframe case you're not giving up later performance or anything
186 Views