Has anyone been using the new web runner as of 2 14 17 I m u Prefect Community #ask-community

Has anyone been using the new web runner (as of 2....

Daryl

02/28/2024, 6:16 AM

Has anyone been using the new web runner (as of 2.14.17, I'm using 2.16.0) and having problems with the server runner process crashing? I have a really nice flow and subflow that take a whack load of files and then submits them for processing, but for some reason, and over vastly different directories of files, I'm seeing the main runner (and this occurs consisenyl whether I'm running 1 server and 50 workers and 1 server or 10 workers) the main flow will crash somewhere between 1300 and 1500 files. I imagine it is totally something I've done with the subflow or process (it's a synchronous flow) but runs fine in parallel except for the fact it kaks out at the limits I mentioned. I imagine this is something I might have done in the flow, subflow or task code (some sort of meomry leak?)., but two things I'd like to know (there is no crash info to speak of in the UX, so : 1. where is a good place to look to find the logs for this sort of issues (it's the standard docker image for 2.16.0) 2. Has anyone who's been using the (experimental but excellent!) web runner seen similar mystifying issues (I'm putting together a blog post on what I've done here since I therre's not a lotta docs on this yet, but wanna make sure everything works as advertised before pushign live.). Thanks y'all!

Nate

02/28/2024, 1:50 PM

hey again @Daryl - do you happen to have an open source version of your code? or at least a representative script that would function as an MRE? i will be taking some time soon to do some more stress testing here (as you mentioned, it is still experimental 🙂 ) but I would super appreciate hearing any consistent failure modes you've been noticing in general to answer your questions: • first place I would want to look is the logs for the serve process, so like

docker logs MYCONTAINERID

if you're still using docker compose • I have not yet noticed crashes, but as mentioned I haven't really pushed the envelope yet

Daryl

02/28/2024, 2:58 PM

@Nate Oh heya! I thought of pinging you but you've been so helpful already I felt it was pushing my luck a little. And didn't want to exhaust you. Happy to DM you the code direct if you wanna take a peak, and yes, still running (I can just add you to the github repo too if that's preferable btw... ). The pipeline ingests a bunch of imagery files and calls via a restful go API a quirky backend legacy API written in lisp. 8-/ (so lotsa sleeps while waiting for it to return some intensive sci processing). Lemme take a look in the logs and see if anything interesting is going on there first, but happy to share the code - it's hardly a closely guarded secret.

🙏 1

Daryl

02/28/2024, 3:28 PM

Hey @Nate ... Looking in the docker logs (thanks for that tip), this may be some sort of natural limitation going on with sqlite at scale. Seeing this sort of error repeated over and over (unclear if it happens at the point of 1300-1500 files processed or a constant log file issues that adds up and possible causes the crash

Copy code

00:57:31.866 | ERROR   | prefect.server.services.flowrunnotifications - Unexpected error in: OperationalError('(sqlite3.OperationalError) database is locked')
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 146, in execute
    self._adapt_connection._handle_exception(error)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 298, in _handle_exception
    raise error
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 128, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 127, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 192, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/core.py", line 132, in _execute
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/core.py", line 115, in run
    result = function()
             ^^^^^^^^^^
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/server/services/loop_service.py", line 79, in start
    await self.run_once()
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/dependencies.py", line 119, in async_wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/services/flow_run_notifications.py", line 38, in run_once
    notifications = await db.get_flow_run_notifications_from_queue(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/interface.py", line 365, in get_flow_run_notifications_from_queue
    return await self.queries.get_flow_run_notifications_from_queue(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/query_components.py", line 1030, in get_flow_run_notifications_from_queue
    await session.execute(delete_stmt)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/session.py", line 452, in execute
    result = await greenlet_spawn(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 197, in greenlet_spawn
    result = context.throw(*sys.exc_info())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2306, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2191, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1946, in orm_execute_statement
    return super().orm_execute_statement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
    result = conn.execute(
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1408, in execute
    return meth(
           ^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 513, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1630, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1839, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1979, in _exec_single_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2335, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 146, in execute
    self._adapt_connection._handle_exception(error)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 298, in _handle_exception
    raise error
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 128, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 127, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 192, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/core.py", line 132, in _execute
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiosqlite/core.py", line 115, in run
    result = function()
             ^^^^^^^^^^

Nate

02/28/2024, 3:28 PM

oooh!

Nate

02/28/2024, 3:28 PM

yeah i was gonna ask if you were on sqlite or pg, but then i forgot

Nate

02/28/2024, 3:29 PM

I would anticipate a significant performance boost with highly concurrent stuff on pg

Daryl

02/28/2024, 3:29 PM

There's about 120000 lines of that error repeated, soooo...

Daryl

02/28/2024, 3:30 PM

The strnage thing is this happens at the 1300-1500 file mark at 10 workers or 50... so, wondering if it causes some sort of memory leeak issue. Tho easy enough for me to add in a postgres container and see if that sorts it.

Nate

02/28/2024, 3:30 PM

hmm - let me know how it goes!

Nate

02/28/2024, 3:30 PM

ill update this to use pg when i have some time

Daryl

02/28/2024, 3:31 PM

Though surprised sqlite would buckle under that. It's pretty perforamnt I find.

Daryl

02/28/2024, 3:31 PM

Oh, andupdate would be kind, thank you,- I think I might have another example with postgres somewheres looks around room just defaulted to sqlite cause it's well... simpler... 🙂

Daryl

02/28/2024, 3:34 PM

Late here in HK so will try this tomorrow when I get a break at work. Thanks for responding on the "where do I look?" newbie question... 🙂

Nate

02/28/2024, 3:34 PM

sounds good!

Nate

02/28/2024, 3:34 PM

catjam

Daryl

03/01/2024, 2:47 PM

@Nate Sadly no joy using postgres as the database though we get a warning message rather than a db lock error. But the pipeline died at about 1285 files out of about 2865 (and that's just for one comet - I've got about 2000.). Anyway... interesting new warning message via docker logs though not that helpful in diagnosing the problem. 😭 Are there any oher deeper logging mechanisms I can use to try to run this down to ground? docker logs below:

Daryl

03/01/2024, 2:47 PM

Copy code

03:14:57.169 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 20.986676 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:17:40.882 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 23.537173 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:20:41.330 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 20.137331 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:23:48.898 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 27.209079 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:26:36.752 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 27.380853 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:29:39.883 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 22.671117 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:32:24.949 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 24.604528 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:34:49.689 | WARNING | prefect.server.services.cancellationcleanup - CancellationCleanup took 24.256178 seconds to run, which is longer than its loop interval of 20.0 seconds.
03:35

Daryl

03/01/2024, 2:47 PM

Stumped tbh. Though it may be something I'm doing with a pipeline task that's leaking memory possibly?

Daryl

03/02/2024, 12:43 AM

(tbh, these look more like messages about what happened after the crash - which may be the same thing as what we see with the sqlite db locked messages. So, not sure it's the source of the crash.).

Nate

03/02/2024, 1:01 AM

hey @Daryl still up for this?

Happy to DM you the code

id love to stress test and see if we can update our handling

Nate

03/02/2024, 1:02 AM

i can back out the general structure and simulate some heavy disk io -ish work if thats whats breaking for you

Daryl

03/02/2024, 2:08 AM

@Nate HIya! I can't imagine it's the disk stuff since it happens so consistently at a certain range point. Suspect more of a data leak. Lemme share the {WIP] code base. It's a lot of sync API calls that combine data but nothing on the surface of it I think should drop the server flow. (that said, it'd be a good idea if you could specify more than one server process to run things/split things to have one take over in case of a crash/fail like this (if you can do this already and I just haven't puzzled it out... do lemme know.).

Daryl

03/02/2024, 8:48 AM

@Nate Added your gh handle to repo `coma-prefect`though since most stuff happens on the server and via api calls, not sure how much it will help. I guess the key thing to note is that the behaviour is quite strange. Interestingly, it always occurs after a full run of 50 concurrent has ifnished. And always around the 1250- 1450 mark in terms of files to be processed. Am going to spend some more time this evening trying to figure out if it's my code or there is an issue in the webrunner somehow. Strangely, not much in terms of logs to run it to ground.

Daryl

03/02/2024, 8:49 AM

Still... everything is working great except for that, so if I can just get over that hump (and scale up to the many thousands of files we have), it's actually all done. It def works.

Daryl

03/12/2024, 12:50 AM

@Nate Did you get any time to work on the web runner and if that gave any joy on the mysterious prefect server flow crashing (and stopping the subflow)? I'm still seeing it every 20-ish task runs (in 50s) so think it may be a memory leak of some sort in the web runner. Have upgraded to 2.16.3 but still (sadly) seeing it. As mentioned above, everything working fine other than that. (you did mention you thought it might be due to i/o and we are copying large stellar photometry files after that, but honestly stumped atm what could be causing it as the code itself is pretty clean. (and weirdly it seems to happen precisely after a block of 50 has occurred so at the start of.a new subflow iteration of all workers - using 50 workers and 1 server) Is there any other way to increase the verbosity of the logs to give me more info than what I provided to you in the last post with the

docker logs

(since that seemed to be indicative of the crashed server flow controlling the subflow.). Is there a way to have a backup server worker to fail back to in the web runner to take over for the other workers? (realizing that that may be getting a bit advanced.).

Matthew Bell

03/14/2024, 9:53 PM

I'm also experiencing a memory leak, not sure if it's prefect related or my own doing yet. Going to do some more testing

Matthew Bell

03/14/2024, 10:11 PM

I'm almost certain prefect has a memory leak here is a minimum viable example using python's

memory_profiler

module (found here)

Copy code

from memory_profiler import profile
from prefect import flow, task


@task
@profile
def with_task():
    return [{"abc": "123"} for _ in range(10_000)]


@profile
def without_task():
    return [{"abc": "123"} for _ in range(10_000)]


@flow
@profile
def main():
    for i in range(10):
        # with_task()
        without_task()
    print("DONE")


main()

Run that

with_task

and

without_task

and watch how memory grows exponentially when we use tasks, but does not grow when we don't.

Alexander Azzam

03/14/2024, 10:16 PM

I know @Andrew Brookins has been thinking about some related problems this last week - tagging him in here

Matthew Bell

03/14/2024, 10:17 PM

Thank you Adam, much appreciated

Alexander Azzam

03/14/2024, 10:21 PM

No worries! As an aside if you’d do use the kindness of leaving a Github issue we might surface other folks who have hit this who aren’t on Slack (and make it easier for us to track and investigate).

Matthew Bell

03/14/2024, 10:22 PM

Will do shortly!

🙇 1

🙌 1

Andrew Brookins

03/14/2024, 10:24 PM

@Matthew Bell What version are you on? 2.16.3 fixed two memory leaks, 2.16.0 fixed others. (I'm trying to root them all out.)

Matthew Bell

03/14/2024, 10:26 PM

You're the man Andrew. I'm on 2.15.0 -- I will upgrade cheers

Andrew Brookins

03/14/2024, 10:27 PM

Perfect. Let me know if you still see anything odd!

👍 1

Alexander Azzam

03/14/2024, 10:27 PM

saved the day

Matthew Bell

03/15/2024, 2:52 AM

@Andrew Brookins I'm testing that same piece of code now on v2.16.4, and it looks like it's still leaking

Matthew Bell

03/15/2024, 5:10 AM

I've added a comment outlining my findings to the existing Github issue that you'd recently closed. I think this still needs further investigating 🙂

Andrew Brookins

03/15/2024, 5:07 PM

I'll take a look! Memory growth while using tasks is expected. If Python never reclaims the memory, that's a problem, and consistently in my testing with 2.16.3, it does (where before, it didn't). Let me try out your code in my profiling harness!

Matthew Bell

03/15/2024, 5:12 PM

Python is never reclaiming the memory here when it should be. If we loop more times memory will continue to grow infinitely.

Matthew Bell

03/15/2024, 5:14 PM

The leak seems to scale with the size of the variables used in our task. If we're reading large amounts of data in our task the leak grows extremely quickly.

Matthew Bell

03/15/2024, 5:22 PM

I'm seeing maybe it has something to do with this line here in

tracemalloc

Andrew Brookins

03/15/2024, 5:22 PM

@Matthew Bell I'll also post on the GH issue, but can you paste console text so I can see the full output you're getting? It may be different than mine, which does not suggest a leak. Can you also share the output of:

Copy code

prefect version

It should look like this, with extended output, not just the version #:

Copy code

prefect version
Version:             2.16.3
API version:         0.8.4
Python version:      3.12.2
Git commit:          e3f02c00
Built:               Thu, Mar 7, 2024 4:56 PM
OS/Arch:             darwin/arm64
Profile:             default
Server type:         cloud

Also, Python version and OS version? I suppose Python will be in that output, so just OS would help.

Matthew Bell

03/15/2024, 5:24 PM

System info:

Python 3.12.1

Apple M3, Sonoma 14.2.1

Copy code

prefect version                                      
Version:             2.16.4
API version:         0.8.4
Python version:      3.12.1
Git commit:          e3e7df9d
Built:               Thu, Mar 14, 2024 5:11 PM
OS/Arch:             darwin/arm64
Profile:             production
Server type:         cloud

Andrew Brookins

03/15/2024, 5:24 PM

Excellent, thank you!

Matthew Bell

03/15/2024, 5:25 PM

Would it be helpful at all to jump on a call and show you what I'm seeing? Otherwise will just dump console text here

Andrew Brookins

03/15/2024, 5:26 PM

Not as much as sharing output. What I need is to establish a reproduction locally.

👍 1

Andrew Brookins

03/15/2024, 5:27 PM

But thank you for the offer! 🙌

Matthew Bell

03/15/2024, 5:29 PM

Here's console output below from this snippet. Note that I'm only running 5 loops here for brevity of output, but if we increase that loop, memory grows infinitely.

Copy code

from memory_profiler import profile
from prefect import flow, task


@task
@profile
def with_task():
    return [{"abc": "123"} for _ in range(10_000)]


@profile
def without_task():
    return [{"abc": "123"} for _ in range(10_000)]


@flow
@profile
def main():
    for i in range(5):
        with_task()
        # without_task()
    print("DONE")


main()

Copy code

13:27:57.390 | INFO    | prefect.engine - Created flow run 'tomato-civet' for flow 'main'
13:27:57.393 | INFO    | Flow run 'tomato-civet' - View at <https://app.prefect.cloud/account/9a597790-3884-4982-99c8-7f9f55834ae7/workspace/38e23fa4-ca5f-4a46-9bcf-7b9ec03077ef/flow-runs/flow-run/6a95fe30-0b63-4ac6-aafd-063663325c45>
13:27:57.989 | INFO    | Flow run 'tomato-civet' - Created task run 'with_task-0' for task 'with_task'
13:27:57.991 | INFO    | Flow run 'tomato-civet' - Executing 'with_task-0' immediately...
Filename: /Users/matthewbell/github/kpi-pipes/src/cdk_flow/test_flow.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    188.4 MiB    188.4 MiB           1   @task
     6                                         @profile
     7                                         def with_task():
     8    190.4 MiB      1.9 MiB       10001       return [{"abc": "123"} for _ in range(10_000)]


13:27:58.662 | INFO    | Task run 'with_task-0' - Finished in state Completed()
13:27:58.803 | INFO    | Flow run 'tomato-civet' - Created task run 'with_task-1' for task 'with_task'
13:27:58.805 | INFO    | Flow run 'tomato-civet' - Executing 'with_task-1' immediately...
Filename: /Users/matthewbell/github/kpi-pipes/src/cdk_flow/test_flow.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    193.0 MiB    193.0 MiB           1   @task
     6                                         @profile
     7                                         def with_task():
     8    194.8 MiB      1.8 MiB       10001       return [{"abc": "123"} for _ in range(10_000)]


13:27:59.607 | INFO    | Task run 'with_task-1' - Finished in state Completed()
13:27:59.737 | INFO    | Flow run 'tomato-civet' - Created task run 'with_task-2' for task 'with_task'
13:27:59.739 | INFO    | Flow run 'tomato-civet' - Executing 'with_task-2' immediately...
Filename: /Users/matthewbell/github/kpi-pipes/src/cdk_flow/test_flow.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    195.7 MiB    195.7 MiB           1   @task
     6                                         @profile
     7                                         def with_task():
     8    197.5 MiB      1.9 MiB       10001       return [{"abc": "123"} for _ in range(10_000)]


13:28:00.372 | INFO    | Task run 'with_task-2' - Finished in state Completed()
13:28:00.526 | INFO    | Flow run 'tomato-civet' - Created task run 'with_task-3' for task 'with_task'
13:28:00.528 | INFO    | Flow run 'tomato-civet' - Executing 'with_task-3' immediately...
Filename: /Users/matthewbell/github/kpi-pipes/src/cdk_flow/test_flow.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    198.6 MiB    198.6 MiB           1   @task
     6                                         @profile
     7                                         def with_task():
     8    200.4 MiB      1.8 MiB       10001       return [{"abc": "123"} for _ in range(10_000)]


13:28:01.269 | INFO    | Task run 'with_task-3' - Finished in state Completed()
13:28:01.424 | INFO    | Flow run 'tomato-civet' - Created task run 'with_task-4' for task 'with_task'
13:28:01.425 | INFO    | Flow run 'tomato-civet' - Executing 'with_task-4' immediately...
Filename: /Users/matthewbell/github/kpi-pipes/src/cdk_flow/test_flow.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    200.9 MiB    200.9 MiB           1   @task
     6                                         @profile
     7                                         def with_task():
     8    202.7 MiB      1.8 MiB       10001       return [{"abc": "123"} for _ in range(10_000)]


13:28:02.108 | INFO    | Task run 'with_task-4' - Finished in state Completed()
DONE
Filename: /Users/matthewbell/github/kpi-pipes/src/cdk_flow/test_flow.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    16    185.1 MiB    185.1 MiB           1   @flow
    17                                         @profile
    18                                         def main():
    19    205.5 MiB      0.0 MiB           6       for i in range(5):
    20    205.5 MiB     20.4 MiB           5           with_task()
    21                                                 # without_task()
    22    205.5 MiB      0.0 MiB           1       print("DONE")


13:28:02.291 | INFO    | Flow run 'tomato-civet' - Finished in state Completed('All states completed.')

Andrew Brookins

03/15/2024, 5:32 PM

Perfect, thank you!

Matthew Bell

03/15/2024, 5:47 PM

flow_run_context.task_run_results

isn't getting cleared as we loop through our tasks. Commenting out line 2688 in

prefect/engine.py

reduces the leak substantially, albeit not entirely.

Matthew Bell

03/15/2024, 6:06 PM

I question whether its a combination of none of these tracking objects clearing as we loop

Matthew Bell

03/15/2024, 10:30 PM

One note, I'm realizing I didn't set

cache_results_in_memory=False

. This helps a ton with memory consumption and I believe addresses my above concern. But the leak still persists even with this.

🙏 1

Matthew Bell

03/25/2024, 3:24 PM

Just to confirm here, since I realize the above message may get misinterpreted, there is still a memory leak that I'm really hoping can be addressed

Alexander Azzam

03/25/2024, 3:26 PM

Hey Matt! Andrew - our memory leak plugger in chief - is OOO today so heads up this thread might be crickets today but we’re working on it

Matthew Bell

03/25/2024, 3:27 PM

Thanks Adam, really appreciate your communication on this 🙂

Alexander Azzam

03/25/2024, 3:47 PM

🙇 no sweat!

22 Views

Open in Slack

Previous Next