< Marvin> hi Marvin I see you answered a question about PREF Prefect Community #ask-community

<@ULVA73B9P>, hi Marvin, I see you answered a ques...

12/20/2023, 2:36 PM

@Marvin, hi Marvin, I see you answered a question about PREFECT_TASK_INTROSPECTION_WARN_THRESHOLD 2m ago. Could you kindly help me understand where exactly in prefect code you have that performance degradation? What line of code in engine.py take too much longer if I pass a dataframe with 10M rows compared to a dataframe with 1 row? really confused because at a glance i don't see it and there is a problem that introspection time is calculated wrong, see https://github.com/PrefectHQ/prefect/issues/11480 i'm migrating from prefect 1 and have multiple warnings saying introspection takes 5-10 minutes (which correlated with upsteam tasks run time)

Marvin

12/20/2023, 2:37 PM

It seems you are referencing an open issue on Prefect's GitHub repository, #11480, where the problem described is that the PREFECT_TASK_INTROSPECTION_WARN_THRESHOLD warning incorrectly includes the runtime of upstream tasks. This causes confusion as the warning suggests the introspection of task parameters is taking longer than expected when in reality, it's the total time that's affected due to waiting for dependencies. The introspection warning is designed to caution users when the inspection of task parameters takes too long, which might indicate that large objects are being passed directly into tasks. This inspection occurs to record metadata about the task runs and to capture any data that should be logged. However, in the context of very large data structures like a 10 million rows dataframe, Prefect doesn't do anything that inherently degrades the performance simply because of the object size. The warning can be misleading if the introspection time isn't calculated correctly, as the time should only measure the actual introspection process, not the runtime of previous tasks. The issue specifically points out that the problem originates from the

orchestrate_task_run

function within

engine.py

of Prefect's core library, but it does not specify the exact line where the miscalculation occurs. A more in-depth examination of the code would be required to pinpoint the problematic section. As for large parameters like a 10M row dataframe, the potential performance impact is not directly due to introspection but could rather stem from serializing or logging large parameters if they are not properly quoted. For now, the suggested workaround for the warning is to wrap large task parameters with

prefect.utilities.annotations.quote

to prevent unnecessary introspection and logging, which should also alleviate the erroneous warning. Since this issue seems to be unresolved at the moment, you may want to follow the GitHub issue for updates or contribute to the conversation there for any specific concerns related to your use case. Additionally, if the warnings are causing too much noise, you can temporarily disable them by setting

PREFECT_TASK_INTROSPECTION_WARN_THRESHOLD=0

as suggested in the warning message itself.

4 Views

Open in Slack

Previous Next