Hi Prefect Community I am running a machine learning model a Prefect Community #ask-community

Hi Prefect Community, I am running a machine learn...

Xavier Babu

11/22/2022, 4:19 PM

Hi Prefect Community, I am running a machine learning model as a workflow using Prefect. If I don't use Flows and Tasks, the same model completes within 90 minutes. But if I convert them as Flow and Tasks, the same model takes 300 minutes to complete. I think, especially it takes more time in dropna() or drop() functions (probably more than that). Is there any way I can allocate more memory for each task to make it run faster since it is a memory-based process? Or do we aware of any memory allocation restriction while running a task or flow? Please shed some light. Even I tried with DASK parallel processing, it didn't help when we run it with a Flow and Tasks. BTW, I am using Prefect Orion 2.4.5 and in a on-premise Linux server.

✅ 1

Ryan Peden

11/22/2022, 4:50 PM

Hi Xavier! It sounds like you're working with DataFrames. Are you returning any large DataFrames from tasks? Prior to Prefect 2.6.0, task and flow results were always pickled and persisted to the local filesystem. This can add significant overhead if you are passing large objects between tasks. Starting with Prefect 2.6.0, result persistence is fully configurable. It is turned off by default, but you can enable it where and when you need it. So if your code involves passing your DataFrames (or other large datasets) between tasks, it's worth trying a newer release of Prefect.

Xavier Babu

11/22/2022, 4:52 PM

Ryan, Yes, we are using larger dataframes. Let me use 2.6 and test it out. Do you have any doc link where I can read more about result persistence configuration? Please share.

Xavier Babu

11/22/2022, 4:52 PM

I just now noticed I am using Prefect 2.6.5

Khuyen Tran

11/22/2022, 4:56 PM

Docs for persisting results: https://docs.prefect.io/concepts/results/#persisting-results

Ryan Peden

11/22/2022, 4:56 PM

Even with Prefect >= 2.6.0, there are features that will automatically turn on result persistence if you use them. This section of the docs gives a good overview.

Xavier Babu

11/22/2022, 4:58 PM

Good info. Got it. Thanks for sharing.

Ryan Peden

11/22/2022, 5:03 PM

You're welcome! Feel free to post again in this thread if you have any other questions.

2 Views

Open in Slack

Previous Next