Hi folks I am trying to use Prefect and found that it is rat Prefect Community #ask-community

Hi folks! I am trying to use Prefect, and found th...

Trevor Campbell

09/11/2021, 12:33 AM

Hi folks! I am trying to use Prefect, and found that it is rather slow for my use case. I thought it might be a mistake I made, but I've created a simple reproducible example that illustrates the issue. Basically: it seems context creation at the start of

task_run

is what is causing a huge amount of overhead. I would expect context creation to be very light weight... more details in the thread below.

Trevor Campbell

09/11/2021, 12:41 AM

Here's the reproducible example (it's a much simplified version of what I'm actually trying to do). Basic idea: • create a list of 4,000 simple list objects • run 3 tasks on each, very simple processing

Copy code

from prefect import Flow, task
import cProfile, pstats, io
from pstats import SortKey

@task
def create_lis():
    return [[3, 0, 4]]*4000

@task
def task1(li):
    li.append(1)
    return li

@task
def task2(li):
    li.sort()
    return li

@task
def task3(li):
    return li[0]

with Flow("blah") as flow:
    lis = create_lis()
    lis = task1.map(lis)
    lis = task2.map(lis)
    lis = task3.map(lis)

flow.run()

This takes about 380 seconds according to

cProfile

It looks like the vast majority of that time is spent creating contexts.

Copy code

1128368941 function calls (1095574869 primitive calls) in 378.017 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  378.022  378.022 /home/trevor/.local/lib/python3.8/site-packages/prefect/core/flow.py:1187(run)
        1    0.001    0.001  378.022  378.022 /home/trevor/.local/lib/python3.8/site-packages/prefect/core/flow.py:1014(_run)
        1    0.000    0.000  378.008  378.008 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:206(run)
 120029/4    0.165    0.000  377.998   94.499 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/runner.py:42(inner)
        1    0.000    0.000  377.998  377.998 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/executors.py:56(inner)
        1    0.186    0.186  377.997  377.997 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:385(get_flow_run_state)
    12004    0.107    0.000  377.601    0.031 /home/trevor/.local/lib/python3.8/site-packages/prefect/executors/local.py:12(submit)
    12004    0.082    0.000  377.494    0.031 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/flow_runner.py:730(run_task)
972412/84181    0.279    0.000  372.034    0.004 {built-in method builtins.next}
    72026    0.173    0.000  371.840    0.005 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/context.py:109(__call__)
    36016    0.026    0.000  369.878    0.010 /usr/lib/python3.8/contextlib.py:108(__enter__)
756273/36013    4.987    0.000  369.296    0.010 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/collections.py:118(merge_dicts)
23696554/11776251   25.761    0.000  332.769    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:533(__setitem__)
23696554/11776251   32.180    0.000  312.634    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:460(__convert_and_store)
    12004    0.014    0.000  252.352    0.021 /home/trevor/.local/lib/python3.8/site-packages/prefect/utilities/executors.py:543(wrapper)
    12004    0.279    0.000  252.338    0.021 /home/trevor/.local/lib/python3.8/site-packages/prefect/engine/task_runner.py:197(run)
2088754/756273    8.121    0.000  235.959    0.000 /home/trevor/.local/lib/python3.8/site-packages/prefect/configuration.py:25(copy)
 23696554   89.220    0.000  202.136    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:731(_safe_attr)
5473976/4141495   18.303    0.000  183.682    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:202(__init__)
 27766023   69.437    0.000  108.927    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:441(__box_config)
416492331   32.862    0.000   32.862    0.000 {method 'startswith' of 'str' objects}
103305581/99920359   11.851    0.000   19.727    0.000 {built-in method builtins.isinstance}
  5473976    9.510    0.000   17.614    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:159(__new__)
222484138   12.514    0.000   12.514    0.000 {method 'append' of 'list' objects}
 23708558    8.287    0.000   10.859    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:355(keys)
 10947952    6.791    0.000    6.791    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:549(__setattr__)
  3385222    1.574    0.000    6.198    0.000 /usr/lib/python3.8/typing.py:768(__instancecheck__)
  3385222    1.475    0.000    4.624    0.000 /usr/lib/python3.8/typing.py:771(__subclasscheck__)
 27802042    4.453    0.000    4.453    0.000 {method 'copy' of 'dict' objects}
 11688386    4.221    0.000    4.221    0.000 {method 'update' of 'dict' objects}
 23733670    3.711    0.000    3.712    0.000 {method 'join' of 'str' objects}
 23696554    3.393    0.000    3.393    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:448(__recast)
 39650452    3.318    0.000    3.318    0.000 {method 'items' of 'dict' objects}
  7826839    2.604    0.000    3.291    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:486(__getitem__)
  6230249    2.378    0.000    2.977    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:378(items)
  3469255    1.358    0.000    2.967    0.000 /home/trevor/.local/lib/python3.8/site-packages/box/box.py:387(get)
  3417440    1.237    0.000    2.814    0.000 {built-in method builtins.issubclass}
 24608876    2.639    0.000    2.639    0.000 {method 'keys' of 'dict' objects}

Trevor Campbell

09/11/2021, 12:43 AM

In contrast, if I run that code without Prefect, it finishes essentially instantaneously

Copy code

lis = create_lis()
for i in range(len(lis)):
    lis[i].append(1)

for i in range(len(lis)):
    lis[i].sort()

for i in range(len(lis)):
    lis[i] = lis[i][0]

Copy code

8005 function calls in 0.048 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4000    0.048    0.000    0.048    0.000 {method 'sort' of 'list' objects}
     4000    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 prefect_test.py:6(create_lis)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Trevor Campbell

09/11/2021, 12:46 AM

In my application, I need to run a flow of this rough size every few minutes -- 400 seconds for just context creation will definitely pose problems for me. I know the way contexts are set up is fairly fundamental to the design of Prefect. But looking at the context objects themselves, they aren't super heavy-duty. So I'm a bit surprised that context creation is eating up that much time. Any ideas whether this is a bottleneck that can be feasibly fixed?

Trevor Campbell

09/11/2021, 12:47 AM

Thanks for the help 🙂 and if the answer is just "no, this is not a problem" that's OK. I can try to be creative about instantiating fewer tasks....

Chris White

09/11/2021, 12:51 AM

Hey Trevor -- thanks for the detailed information! I definitely agree that we could probably make some of this more efficient, but I do also want to call out that most Prefect flow runs work with a Prefect API (self hosted or Cloud) so there is always some added latency per task run (whether it be networking or context creation). Not to suggest we don't want to optimize this further, but this amount of latency per run is typically acceptable for Prefect use cases. Back to the real issue though - it looks like at least some of this is coming from this

merge_dicts

call in the init method: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/context.py#L92 That call is really only necessary for the very first initialization of context, so maybe there's a way to optimize that out for future copies / manipulations of context 🧐

Chris White

09/11/2021, 12:52 AM

maybe we could introduce an init kwarg that we use in the first initialization here: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/utilities/context.py#L134 and otherwise we bypass that; I'm sure there are some edge cases to think through, but maybe that's a start for your investigations

Trevor Campbell

09/11/2021, 12:54 AM

Oooh yes. @Chris White just for context, I was actually originally doing this stuff using the server backend as well. But that was causing 600MB of network traffic and 600MB of write to disk (in the postgres container) with the above flow

Trevor Campbell

09/11/2021, 12:54 AM

so I decided to just try doing things with core only for the moment

Chris White

09/11/2021, 12:55 AM

gotcha, makes sense -- we're working on a more streamlined approach to Server that should reduce a lot of that noise; expect some cool announcements next month 😉 in the meantime though I'd definitely be curious to hear how far we can push context creation

Trevor Campbell

09/11/2021, 12:56 AM

nice 😄

Trevor Campbell

09/11/2021, 12:56 AM

yes, i'm totally happy working with just core for now

Trevor Campbell

09/11/2021, 12:56 AM

eventually I'll want to use that awesome UI (it was the reason I decided to switch my code to Prefect in the first place!)

Chris White

09/11/2021, 12:57 AM

😄 yea! I'll make a note to ping you next month, it'd be good to hear your feedback on the performance side

Trevor Campbell

09/11/2021, 12:59 AM

many thanks, I would really appreciate that. OK as for your suggestions re: bypassing the

merge_dicts

call, I will take some time to ponder that

Trevor Campbell

09/11/2021, 12:59 AM

I'm in a crunch to get this system working (hack or no hack) by tonight. but after that I should have some breathing room to see whether I can come up with something useful

Chris White

09/11/2021, 1:00 AM

ah gotcha; in that case my recommendation would be to put all the mapped task steps into a single mapped task. Should still be meaningful enough granularity (per-item processing) and would be a big reduction in runs

Trevor Campbell

09/11/2021, 1:00 AM

I've actually already bundled up tasks in a reasonably sane way 🙂

😄 1

Trevor Campbell

09/11/2021, 1:01 AM

it runs at a totally reasonable pace now

Chris White

09/11/2021, 1:01 AM

awesome

Trevor Campbell

09/11/2021, 1:01 AM

i'll get back to you on context setup when I get the chance. Would it be worthwhile to open an issue on the github about it?

Chris White

09/11/2021, 1:02 AM

yup definitely, these slack threads are easy to lose track of

Trevor Campbell

09/11/2021, 1:03 AM

roger that. Thanks for your help. Looking forward to the server improvements!

👍 1

7 Views

Open in Slack

Previous Next