Naga Sravika Bodapati

    Naga Sravika Bodapati

    4 months ago
    Hi all, we are seeing all our flows on prefect 1.0 fail with error : cannot allocate memory. Can u name a scenario/reason why this could happen? This happens often when the flows are long time running and there is no way for us to debug using the logs on prefect. Please help.
    Anna Geller

    Anna Geller

    4 months ago
    Can u name a scenario/reason why this could happen?
    1. Not enough memory on your VM/server 2. Not enough memory allocated to Kubernetes pod or Docker container, ECS task 3. General execution layer issues outside of Prefect I could name more but not sure this would be helpful. Can you explain your use case more? Long-running jobs are generally difficult to troubleshoot, check this for more background
    Naga Sravika Bodapati

    Naga Sravika Bodapati

    4 months ago
    We see this issue once a week. We are really not sure why this is happening. We have not seen any no heartbeat detected issue - but even then how will making change with respect to heart beat address the issue? Yes, these runs are long running - so might be dead. And these are local agent run flows. For info, The flows make a connection to our databases in a loop and fetch new/updated data rows into big query.
    Anna Geller

    Anna Geller

    4 months ago
    Can you share your code/pseudo code to show what you are doing? What's your storage and run config?
    how will making change with respect to heart beat address the issue?
    by default, it uses processes, and threads work better for long-running jobs, that's the only reason
    Naga Sravika Bodapati

    Naga Sravika Bodapati

    4 months ago
    flow.run_config = LocalRun(labels=["<agent>"])
    flow.storage = GCS(bucket="<bucketname>")
    Anna Geller

    Anna Geller

    4 months ago
    that's already good to hear since a local agent is probably the easiest to handle long-running jobs as it runs as a subprocess - can you share your flow too?
    The flows make a connection to our databases in a loop and fetch new/updated data rows into big query
    the reason I'm asking about the flow code - we've seen similar issues when a user was sharing DB connections between tasks. This is not supported in Prefect 2.0 unless you use a resource manager https://docs.prefect.io/core/idioms/resource-manager.html
    Naga Sravika Bodapati

    Naga Sravika Bodapati

    4 months ago
    Its a bit of a long code.
    Anna Geller

    Anna Geller

    4 months ago
    Thanks! Some feedback: • enable_ct_on_database has a logger which is not defined - are you sure this code works? perhaps you didn't share the full code? • same in delta_push and other non-task functions • Email, Content, Mail, To - improts are missing This line can be removed: flow.storage.build() Then, instead of:
    with Flow(flow_name, schedule=schedule) as flow:
        flow.run_config = LocalRun(labels=[""])
        flow.storage = GCS(bucket="")
    Try:
    with Flow(
        flow_name,
        schedule=schedule,
        run_config=LocalRun(labels=["your_host_name"]),
        storage=GCS(bucket="your_bucket"),
    ) as flow:
    the label of empty string seems weird - are you sure you explicitly created a label with an empty string? if you didn't specify any labels on a local agent, usually this would need to be the name of the host (your VM/server name)
    lastly, you should wrap the call to register a flow into main to avoid any weird issues:
    if __name__ == '__main__':
        flow.register(project_name=project_name)
    Naga Sravika Bodapati

    Naga Sravika Bodapati

    4 months ago
    hey anna, thanks for noting all these misfits - yes, i have removed some of the code because they contain some sensitive information. I will make the changes u have suggested and will get back to u. The machine we have as of now is 16GB RAM and this is a recurring issue - not sure how much it will be resolved though!
    Anna Geller

    Anna Geller

    4 months ago
    awesome, keep us posted how it goes!