Hello! I am building some ELT tooling from databa...
# ask-community
c
Hello! I am building some ELT tooling from database tables to Snowflake. In testing, I have jobs failing with
Copy code
snowflake.connector.errors.ProgrammingError: 001795 (42601): SQL compilation error: error line 1 at position 115
maximum number of expressions in a list exceeded, expected at most 16,384, got 251,666
This is a common snowflake issue: • https://community.snowflake.com/s/question/0D50Z00008759PiSAI/error-maximum-number-of-ex[…]sions-in-a-list-exceeded-expected-at-most-16384-got-29933https://stackoverflow.com/questions/62541089/snowflake-connector-sql-compilation-error-maximum-number-of-expressions-in-a-lihttps://docs.snowflake.com/developer-guide/python-connector/python-connector-example#label-python-connector-binding-batch-inserts I've used
prefect_snowflake.database.SnowflakeConnector.execute_many
in my implementation:
Copy code
query_str = f"INSERT INTO {db}.{schema}.{table} (col1, col2, ...) VALUES (%(COL1)s, $(COL2)s, ...);"
    conn_obj.execute_many(query_str, seq_of_parameters=param_list)
This works perfectly fine for small tables, but not for tables with over 16k of records to fetch. Reading https://stackoverflow.com/questions/62541089/snowflake-connector-sql-compilation-error-maximum-number-of-expressions-in-a-li I wonder if this is being executed as a client-side cursor, and if it were to execute as a server-side cursor, then the issue would no longer manifest. To enforce server-side cursor, I'd need to set qmark or numeric binding - but that needs setting before the cursor is created.... Can someone assist?
s
Hi! Have you considered dumping the data into something like parquet files on S3, and creating an external stage/external table? That seems to be a best practice for getting lots of data into snowflake.
j
I agree with Sean. I can convert most of my data to Pandas dataframes, then I use
.write_pandas()
on the Snowflake session. In the background this pushes it to a stage in Snowflake, then uses
COPY
commands to insert the data into the target table. But I'm just using the snowflake-snowpark-python package, not the prefect_snowflake one. Straight
insert
statements in Snowflake are s l o w.
🐢 1
c
Ach! never posted this on Friday! Sorry! OK. I've got a way to force server-side cursor - • insure that python
snowflake
module is loaded • instantiate a
SnowflakeConnector
• force close all connections by calling the
close()
method on the instance • set the snowflake connector parameter style
snowflake.connector.paramstyle = 'qmark
• create a sequence of parameters of the form
?,?, ... , ?
and use this instead of
[(col_name, value), ...]
list of tuples as advised in Prefect
SnowflakeConnector
documentation I absolutely do not want to convert to a pandas data frame in the interim.... too messy and prone to horrible memory, conversion issues, particularly with lots of large DB ELT jobs.... Documentation really needs improving around ELT support - come on, Prefect team!
I am also very much trying to avoid the mess of creating an external stage and loading that way. That is a suitable approach for file-based stores. I am using Prefect here to build DB-to-Warehouse ELT.....
(@Sean Williams @Jack Goslin)
j
Can you push to a Snowflake internal stage?
s
Let’s start with the simplest method of data ingestion, the INSERT command. This is a suitable approach to bringing a small amount of data, it has some limitations for large data sets exceeding the single digit MB range, particularly around ease of use, scalability, and error handling. For larger data sets, data engineers typically have the option to use a variety of ETL/ELT tools to ingest data, or preferably use object storage as an intermediate step alongside COPY INTO or Snowpipe.
If you want to stick with insert, Snowflake's advice is to stick to very small amounts of data. https://www.snowflake.com/blog/best-practices-for-data-ingestion/
c
@Jack Goslin @Sean Williams good points! I think PUT to internal stage + COPY INTO may be eactly what I need here. thanks!
🙌 1
☃️ 1