https://prefect.io logo
a

Andreas Jung

12/14/2020, 11:04 AM
Just started with Prefect, wrote this example script to measure the execution speed. The script contains 4 tasks where only the "parse" task is doing some real processing on the data (taking less than 1ms). However the over all execution speed of the whole flow is always between 1000ms and 1500ms...why is this? There such a huge overhead in the underlying task scheduler/executor?
Copy code
import time 
from base64 import b64decode 
 
from prefect import Flow, task 
 
import reportparser 
 
 
@task 
def fetch_message(): 
 
   ts = time.time() 
   report_data = b64decode( 
       b"ew4AAAAAAADmAgAABgAAAQAAH+EH3QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" 
       b"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIBPwgQAA4AAAAA" 
   ) 
   print("fetch  ", time.time()) 
   print("fetch  ", time.time() - ts) 
   return report_data 
 
 
@task 
def classify_message(msg): 
   msg_type = 0 
   ts = time.time() 
   print("classify", time.time()) 
   print("classify", time.time() - ts) 
   return msg_type 
 
 
@task 
def parse_message(msg, msg_type): 
   ts = time.time() 
   result = reportparser.parse(msg) 
   print("parse  ", time.time()) 
   print("parse  ", time.time() - ts) 
   return result 
 
 
@task 
def deliver_message(result): 
   ts = time.time() 
   print("deliver ", time.time()) 
   print("deliver ", time.time() - ts) 
 
 
def main(): 
 
   with Flow("reportparser") as flow: 
       msg = fetch_message() 
       msg_type = classify_message(msg) 
       result = parse_message(msg, msg_type) 
       deliver_message(result) 
 
   for i in range(20): 
       print() 
       ts = time.time() 
       flow.run() 
       print("total", time.time() - ts) 
 
 
if __name__ == "__main__": 
    main()
  Execution time (absolute time, relative time per task):
Copy code
fetch   1607943883.0552974
fetch   0.0001933574676513672
classify 1607943883.3690984
classify 0.00015401840209960938
parse   1607943883.6593442
parse   0.0009167194366455078
deliver 1607943883.9757109
deliver 0.0001609325408935547
total 1.3962466716766357
m

Marwan Sarieddine

12/14/2020, 6:16 PM
@Andreas Jung I did some profiling once and found the context dictionary resolving function (
merge_dicts
) to be taking up considerable time (at least 1s on my machine as far as I recall) see the issue I opened in case you want more details https://github.com/PrefectHQ/prefect/issues/2909
note merge_dicts is called prior to every task run ...