Hi folks! I am trying to use Prefect, and found th...
# ask-community
t
Hi folks! I am trying to use Prefect, and found that it is rather slow for my use case. I thought it might be a mistake I made, but I've created a simple reproducible example that illustrates the issue. Basically: it seems context creation at the start of
task_run
is what is causing a huge amount of overhead. I would expect context creation to be very light weight... more details in the thread below.
Here's the reproducible example (it's a much simplified version of what I'm actually trying to do). Basic idea: • create a list of 4,000 simple list objects • run 3 tasks on each, very simple processing
Copy code
from prefect import Flow, task
import cProfile, pstats, io
from pstats import SortKey

@task
def create_lis():
    return [[3, 0, 4]]*4000

@task
def task1(li):
    li.append(1)
    return li

@task
def task2(li):
    li.sort()
    return li

@task
def task3(li):
    return li[0]

with Flow("blah") as flow:
    lis = create_lis()
    lis = task1.map(lis)
    lis = task2.map(lis)
    lis = task3.map(lis)

flow.run()
This takes about 380 seconds according to
cProfile
It looks like the vast majority of that time is spent creating contexts.
Copy code
1128368941 function calls (1095574869 primitive calls) in 378.017 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  378.022  378.022 /home/trevor/.local/lib/python3.8/site-packages/prefect/core/flow.py:1187(run)
        1    0.001    0.001  378.022  378.022 /home/trevor/.local/lib/python3.8/site-packages/prefect/core/flow.py:1014(_run)
        1    0.000    0.000  378.008  378.008 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:206(run)
 120029/4    0.165    0.000  377.998   94.499 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/runner.py:42(inner)
        1    0.000    0.000  377.998  377.998 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/executors.py:56(inner)
        1    0.186    0.186  377.997  377.997 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:385(get_flow_run_state)
    12004    0.107    0.000  377.601    0.031 /home/trevor/.local/lib/python3.8/site-packages/prefect/executors/local.py:12(submit)
    12004    0.082    0.000  377.494    0.031 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:730(run_task)
972412/84181    0.279    0.000  372.034    0.004 {built-in method builtins.next}
    72026    0.173    0.000  371.840    0.005 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/context.py:109(__call__)
    36016    0.026    0.000  369.878    0.010 /usr/lib/python3.8/contextlib.py:108(__enter__)
756273/36013    4.987    0.000  369.296    0.010 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/collections.py:118(merge_dicts)
23696554/11776251   25.761    0.000  332.769    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:533(__setitem__)
23696554/11776251   32.180    0.000  312.634    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:460(__convert_and_store)
    12004    0.014    0.000  252.352    0.021 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/executors.py:543(wrapper)
    12004    0.279    0.000  252.338    0.021 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/task_runner.py:197(run)
2088754/756273    8.121    0.000  235.959    0.000 /home/trevor/.local/lib/python3.8/site-packages/prefect/configuration.py:25(copy)
 23696554   89.220    0.000  202.136    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:731(_safe_attr)
5473976/4141495   18.303    0.000  183.682    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:202(__init__)
 27766023   69.437    0.000  108.927    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:441(__box_config)
416492331   32.862    0.000   32.862    0.000 {method 'startswith' of 'str' objects}
103305581/99920359   11.851    0.000   19.727    0.000 {built-in method builtins.isinstance}
  5473976    9.510    0.000   17.614    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:159(__new__)
222484138   12.514    0.000   12.514    0.000 {method 'append' of 'list' objects}
 23708558    8.287    0.000   10.859    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:355(keys)
 10947952    6.791    0.000    6.791    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:549(__setattr__)
  3385222    1.574    0.000    6.198    0.000 /usr/lib/python3.8/typing.py:768(__instancecheck__)
  3385222    1.475    0.000    4.624    0.000 /usr/lib/python3.8/typing.py:771(__subclasscheck__)
 27802042    4.453    0.000    4.453    0.000 {method 'copy' of 'dict' objects}
 11688386    4.221    0.000    4.221    0.000 {method 'update' of 'dict' objects}
 23733670    3.711    0.000    3.712    0.000 {method 'join' of 'str' objects}
 23696554    3.393    0.000    3.393    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:448(__recast)
 39650452    3.318    0.000    3.318    0.000 {method 'items' of 'dict' objects}
  7826839    2.604    0.000    3.291    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:486(__getitem__)
  6230249    2.378    0.000    2.977    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:378(items)
  3469255    1.358    0.000    2.967    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:387(get)
  3417440    1.237    0.000    2.814    0.000 {built-in method builtins.issubclass}
 24608876    2.639    0.000    2.639    0.000 {method 'keys' of 'dict' objects}
In contrast, if I run that code without Prefect, it finishes essentially instantaneously
Copy code
lis = create_lis()
for i in range(len(lis)):
    lis[i].append(1)

for i in range(len(lis)):
    lis[i].sort()

for i in range(len(lis)):
    lis[i] = lis[i][0]
Copy code
8005 function calls in 0.048 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4000    0.048    0.000    0.048    0.000 {method 'sort' of 'list' objects}
     4000    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 prefect_test.py:6(create_lis)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
In my application, I need to run a flow of this rough size every few minutes -- 400 seconds for just context creation will definitely pose problems for me. I know the way contexts are set up is fairly fundamental to the design of Prefect. But looking at the context objects themselves, they aren't super heavy-duty. So I'm a bit surprised that context creation is eating up that much time. Any ideas whether this is a bottleneck that can be feasibly fixed?
Thanks for the help šŸ™‚ and if the answer is just "no, this is not a problem" that's OK. I can try to be creative about instantiating fewer tasks....
c
Hey Trevor -- thanks for the detailed information! I definitely agree that we could probably make some of this more efficient, but I do also want to call out that most Prefect flow runs work with a Prefect API (self hosted or Cloud) so there is always some added latency per task run (whether it be networking or context creation). Not to suggest we don't want to optimize this further, but this amount of latency per run is typically acceptable for Prefect use cases. Back to the real issue though - it looks like at least some of this is coming from this
merge_dicts
call in the init method: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/context.py#L92 That call is really only necessary for the very first initialization of context, so maybe there's a way to optimize that out for future copies / manipulations of context 🧐
maybe we could introduce an init kwarg that we use in the first initialization here: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/context.py#L134 and otherwise we bypass that; I'm sure there are some edge cases to think through, but maybe that's a start for your investigations
t
Oooh yes. @Chris White just for context, I was actually originally doing this stuff using the server backend as well. But that was causing 600MB of network traffic and 600MB of write to disk (in the postgres container) with the above flow
so I decided to just try doing things with core only for the moment
c
gotcha, makes sense -- we're working on a more streamlined approach to Server that should reduce a lot of that noise; expect some cool announcements next month šŸ˜‰ in the meantime though I'd definitely be curious to hear how far we can push context creation
t
nice šŸ˜„
yes, i'm totally happy working with just core for now
eventually I'll want to use that awesome UI (it was the reason I decided to switch my code to Prefect in the first place!)
c
šŸ˜„ yea! I'll make a note to ping you next month, it'd be good to hear your feedback on the performance side
t
many thanks, I would really appreciate that. OK as for your suggestions re: bypassing the
merge_dicts
call, I will take some time to ponder that
I'm in a crunch to get this system working (hack or no hack) by tonight. but after that I should have some breathing room to see whether I can come up with something useful
c
ah gotcha; in that case my recommendation would be to put all the mapped task steps into a single mapped task. Should still be meaningful enough granularity (per-item processing) and would be a big reduction in runs
t
I've actually already bundled up tasks in a reasonably sane way šŸ™‚
šŸ˜„ 1
it runs at a totally reasonable pace now
c
awesome
t
i'll get back to you on context setup when I get the chance. Would it be worthwhile to open an issue on the github about it?
c
yup definitely, these slack threads are easy to lose track of
t
roger that. Thanks for your help. Looking forward to the server improvements!
šŸ‘ 1