Trevor Campbell

    Trevor Campbell

    1 year ago
    Hi folks! I am trying to use Prefect, and found that it is rather slow for my use case. I thought it might be a mistake I made, but I've created a simple reproducible example that illustrates the issue. Basically: it seems context creation at the start of
    task_run
    is what is causing a huge amount of overhead. I would expect context creation to be very light weight... more details in the thread below.
    Here's the reproducible example (it's a much simplified version of what I'm actually trying to do). Basic idea: ā€¢ create a list of 4,000 simple list objects ā€¢ run 3 tasks on each, very simple processing
    from prefect import Flow, task
    import cProfile, pstats, io
    from pstats import SortKey
    
    @task
    def create_lis():
        return [[3, 0, 4]]*4000
    
    @task
    def task1(li):
        li.append(1)
        return li
    
    @task
    def task2(li):
        li.sort()
        return li
    
    @task
    def task3(li):
        return li[0]
    
    with Flow("blah") as flow:
        lis = create_lis()
        lis = task1.map(lis)
        lis = task2.map(lis)
        lis = task3.map(lis)
    
    flow.run()
    This takes about 380 seconds according to
    cProfile
    It looks like the vast majority of that time is spent creating contexts.
    1128368941 function calls (1095574869 primitive calls) in 378.017 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000  378.022  378.022 /home/trevor/.local/lib/python3.8/site-packages/prefect/core/flow.py:1187(run)
            1    0.001    0.001  378.022  378.022 /home/trevor/.local/lib/python3.8/site-packages/prefect/core/flow.py:1014(_run)
            1    0.000    0.000  378.008  378.008 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:206(run)
     120029/4    0.165    0.000  377.998   94.499 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/runner.py:42(inner)
            1    0.000    0.000  377.998  377.998 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/executors.py:56(inner)
            1    0.186    0.186  377.997  377.997 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:385(get_flow_run_state)
        12004    0.107    0.000  377.601    0.031 /home/trevor/.local/lib/python3.8/site-packages/prefect/executors/local.py:12(submit)
        12004    0.082    0.000  377.494    0.031 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:730(run_task)
    972412/84181    0.279    0.000  372.034    0.004 {built-in method builtins.next}
        72026    0.173    0.000  371.840    0.005 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/context.py:109(__call__)
        36016    0.026    0.000  369.878    0.010 /usr/lib/python3.8/contextlib.py:108(__enter__)
    756273/36013    4.987    0.000  369.296    0.010 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/collections.py:118(merge_dicts)
    23696554/11776251   25.761    0.000  332.769    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:533(__setitem__)
    23696554/11776251   32.180    0.000  312.634    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:460(__convert_and_store)
        12004    0.014    0.000  252.352    0.021 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/executors.py:543(wrapper)
        12004    0.279    0.000  252.338    0.021 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/task_runner.py:197(run)
    2088754/756273    8.121    0.000  235.959    0.000 /home/trevor/.local/lib/python3.8/site-packages/prefect/configuration.py:25(copy)
     23696554   89.220    0.000  202.136    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:731(_safe_attr)
    5473976/4141495   18.303    0.000  183.682    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:202(__init__)
     27766023   69.437    0.000  108.927    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:441(__box_config)
    416492331   32.862    0.000   32.862    0.000 {method 'startswith' of 'str' objects}
    103305581/99920359   11.851    0.000   19.727    0.000 {built-in method builtins.isinstance}
      5473976    9.510    0.000   17.614    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:159(__new__)
    222484138   12.514    0.000   12.514    0.000 {method 'append' of 'list' objects}
     23708558    8.287    0.000   10.859    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:355(keys)
     10947952    6.791    0.000    6.791    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:549(__setattr__)
      3385222    1.574    0.000    6.198    0.000 /usr/lib/python3.8/typing.py:768(__instancecheck__)
      3385222    1.475    0.000    4.624    0.000 /usr/lib/python3.8/typing.py:771(__subclasscheck__)
     27802042    4.453    0.000    4.453    0.000 {method 'copy' of 'dict' objects}
     11688386    4.221    0.000    4.221    0.000 {method 'update' of 'dict' objects}
     23733670    3.711    0.000    3.712    0.000 {method 'join' of 'str' objects}
     23696554    3.393    0.000    3.393    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:448(__recast)
     39650452    3.318    0.000    3.318    0.000 {method 'items' of 'dict' objects}
      7826839    2.604    0.000    3.291    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:486(__getitem__)
      6230249    2.378    0.000    2.977    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:378(items)
      3469255    1.358    0.000    2.967    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:387(get)
      3417440    1.237    0.000    2.814    0.000 {built-in method builtins.issubclass}
     24608876    2.639    0.000    2.639    0.000 {method 'keys' of 'dict' objects}
    In contrast, if I run that code without Prefect, it finishes essentially instantaneously
    lis = create_lis()
    for i in range(len(lis)):
        lis[i].append(1)
    
    for i in range(len(lis)):
        lis[i].sort()
    
    for i in range(len(lis)):
        lis[i] = lis[i][0]
    8005 function calls in 0.048 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         4000    0.048    0.000    0.048    0.000 {method 'sort' of 'list' objects}
         4000    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
            1    0.000    0.000    0.000    0.000 prefect_test.py:6(create_lis)
            3    0.000    0.000    0.000    0.000 {built-in method builtins.len}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    In my application, I need to run a flow of this rough size every few minutes -- 400 seconds for just context creation will definitely pose problems for me. I know the way contexts are set up is fairly fundamental to the design of Prefect. But looking at the context objects themselves, they aren't super heavy-duty. So I'm a bit surprised that context creation is eating up that much time. Any ideas whether this is a bottleneck that can be feasibly fixed?
    Thanks for the help šŸ™‚ and if the answer is just "no, this is not a problem" that's OK. I can try to be creative about instantiating fewer tasks....
    Chris White

    Chris White

    1 year ago
    Hey Trevor -- thanks for the detailed information! I definitely agree that we could probably make some of this more efficient, but I do also want to call out that most Prefect flow runs work with a Prefect API (self hosted or Cloud) so there is always some added latency per task run (whether it be networking or context creation). Not to suggest we don't want to optimize this further, but this amount of latency per run is typically acceptable for Prefect use cases. Back to the real issue though - it looks like at least some of this is coming from this
    merge_dicts
    call in the init method: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/context.py#L92 That call is really only necessary for the very first initialization of context, so maybe there's a way to optimize that out for future copies / manipulations of context šŸ§
    maybe we could introduce an init kwarg that we use in the first initialization here: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/context.py#L134 and otherwise we bypass that; I'm sure there are some edge cases to think through, but maybe that's a start for your investigations
    Trevor Campbell

    Trevor Campbell

    1 year ago
    Oooh yes. @Chris White just for context, I was actually originally doing this stuff using the server backend as well. But that was causing 600MB of network traffic and 600MB of write to disk (in the postgres container) with the above flow
    so I decided to just try doing things with core only for the moment
    Chris White

    Chris White

    1 year ago
    gotcha, makes sense -- we're working on a more streamlined approach to Server that should reduce a lot of that noise; expect some cool announcements next month šŸ˜‰ in the meantime though I'd definitely be curious to hear how far we can push context creation
    Trevor Campbell

    Trevor Campbell

    1 year ago
    nice šŸ˜„
    yes, i'm totally happy working with just core for now
    eventually I'll want to use that awesome UI (it was the reason I decided to switch my code to Prefect in the first place!)
    Chris White

    Chris White

    1 year ago
    šŸ˜„ yea! I'll make a note to ping you next month, it'd be good to hear your feedback on the performance side
    Trevor Campbell

    Trevor Campbell

    1 year ago
    many thanks, I would really appreciate that. OK as for your suggestions re: bypassing the
    merge_dicts
    call, I will take some time to ponder that
    I'm in a crunch to get this system working (hack or no hack) by tonight. but after that I should have some breathing room to see whether I can come up with something useful
    Chris White

    Chris White

    1 year ago
    ah gotcha; in that case my recommendation would be to put all the mapped task steps into a single mapped task. Should still be meaningful enough granularity (per-item processing) and would be a big reduction in runs
    Trevor Campbell

    Trevor Campbell

    1 year ago
    I've actually already bundled up tasks in a reasonably sane way šŸ™‚
    it runs at a totally reasonable pace now
    Chris White

    Chris White

    1 year ago
    awesome
    Trevor Campbell

    Trevor Campbell

    1 year ago
    i'll get back to you on context setup when I get the chance. Would it be worthwhile to open an issue on the github about it?
    Chris White

    Chris White

    1 year ago
    yup definitely, these slack threads are easy to lose track of
    Trevor Campbell

    Trevor Campbell

    1 year ago
    roger that. Thanks for your help. Looking forward to the server improvements!