https://prefect.io logo
Title
t

Tony Yun

11/04/2022, 11:51 AM
When we have to loop a task to run multiple times, how can we avoid memory issue? I found that each time a task run, the memory steadily going up but not going down even if the variable is overwritten every time. (i’m using Prefect 2)
m

Mathijs Carlu

11/04/2022, 2:07 PM
You can persist results, instead of caching them in memory (which indeed happens by default). More info on this page, particularly the sections "Persisting results" and "Caching of results in memory"
🙌 1
n

Nate

11/04/2022, 3:25 PM
Hey @Tony Yun - can you share your code where you're looping a task to run several times?
t

Tony Yun

11/05/2022, 2:09 AM
hey @Nate, this is my code(this
video_data
is a large data frame which is memory intense):
batch_size = 100
    to_fetch_channels = []
    for i, upload_id in enumerate(uploadIds):
        counter = i + 1
        to_fetch_channels.append(upload_id)
        if counter % batch_size == 0:    
            base_video_data = get_upload_details2(youtube_api_keys, to_fetch_channels)
            to_fetch_channels = []

            video_ids = [i for i in base_video_data if i not in excluded_video_data]
            
            # this video_data is a large data frame which is memory intense
            video_data = get_video_details(youtube_api_keys, video_ids)

            <http://logger.info|logger.info>(f'Loading {counter}/{len(uploadIds)} video IDs to Snowflake...')

            try:
                videos_table = build_video_table(video_data)
                snowflake_load_data(
                    "video_data",
                    merge_videos("video_data"),
                    snowflake_auth,
                    videos_table
                    )
            except DataEmptyError as e:
                logger.error(e)
@Mathijs Carlu got it. Let me try that way.