Hi Prefect Community, I am running a machine learn...
# prefect-community
x
Hi Prefect Community, I am running a machine learning model as a workflow using Prefect. If I don't use Flows and Tasks, the same model completes within 90 minutes. But if I convert them as Flow and Tasks, the same model takes 300 minutes to complete. I think, especially it takes more time in dropna() or drop() functions (probably more than that). Is there any way I can allocate more memory for each task to make it run faster since it is a memory-based process? Or do we aware of any memory allocation restriction while running a task or flow? Please shed some light. Even I tried with DASK parallel processing, it didn't help when we run it with a Flow and Tasks. BTW, I am using Prefect Orion 2.4.5 and in a on-premise Linux server.
1
r
Hi Xavier! It sounds like you're working with DataFrames. Are you returning any large DataFrames from tasks? Prior to Prefect 2.6.0, task and flow results were always pickled and persisted to the local filesystem. This can add significant overhead if you are passing large objects between tasks. Starting with Prefect 2.6.0, result persistence is fully configurable. It is turned off by default, but you can enable it where and when you need it. So if your code involves passing your DataFrames (or other large datasets) between tasks, it's worth trying a newer release of Prefect.
x
Ryan, Yes, we are using larger dataframes. Let me use 2.6 and test it out. Do you have any doc link where I can read more about result persistence configuration? Please share.
I just now noticed I am using Prefect 2.6.5
k
r
Even with Prefect >= 2.6.0, there are features that will automatically turn on result persistence if you use them. This section of the docs gives a good overview.
x
Good info. Got it. Thanks for sharing.
r
You're welcome! Feel free to post again in this thread if you have any other questions.