Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

I have a flow that runs a `select * from table` to snowflake using a SnowflakeQuery task. It appears that this task uses `fetchall()` to return this data into memory: <https://github.com/PrefectHQ/prefect/blob/5d2732a30563591410cb11fe0f7e7dfe65cc5669/src/prefect/tasks/snowflake/snowflake.py#L186>

I expect that this causes performance issues with extremely large queries so I am wondering what Prefect Cloud’s tolerance for this. Ideally I think it would be preferable to lazy load query results and/or allow for query pagination? Perhaps there’s an architecture limitation I’m overlooking? I’d appreciate any insight :slightly_smiling_face:

What do you think you’d do without Prefect? I would break up my query if I was trying to write out multiple files. Like for example, I would just run Jan 2022 and then Feb 2022 and so on. You can even use mapping like this if Snowflake accepts multiple connections. I suspect this is easier to tackle from the query side rather than the collection side. I dunno though. Wondering what you think?

Anyway you are limited by what you can hold in memory. Prefect has no inherent limitation here. It’s more of the limitation of your system

My line of thinking was that a lot of python cursor libraries have built in handling for pagination so it would be cool to provide some sort of page size parameter at the task level. I suppose the downside to this path is that it lays the load on the database to keep iterating. Implementing pagination at the query side does seem to be the most lightweight solution

I guess finding a method to paginate a table would be a pain point if there’s no time series stored on it

But I’m sure theres always some sort of value you can key off of for pagination

I am series LIMIT and OFFSET on the query level when I search, not on the cursor level. Am not super used to Snowflake of course though

Oh this is nice:
```# Execute a statement that will generate a result set.
sql = "select * from t"
cur.execute(sql)
 
# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
for df in cur.fetch_pandas_batches():
    my_dataframe_processing_function(df)```

woah the cursor converts it to a dataframe right away

Yes and there is also `fetch_pandas_all()`

it looks like snowflake supports querying using offset limit <https://dwgeek.com/snowflake-limit-and-offset-uses-examples.html/> which I think I’ll go and implement

I gotta be honest I was thinking too much with my python brain and not enough with my sql brain