Jørgen Nygaard
11/26/2024, 9:51 AMfrom prefect_gcp.bigquery import BigQueryWarehouse
with BigQueryWarehouse.load("bigquery") as warehouse:
create_operation = '''
CREATE TABLE IF NOT EXISTS mydataset.mytable (
col1 STRING,
col2 INTEGER,
col3 BOOLEAN
)
'''
warehouse.execute(create_operation)
insert_operation = '''
INSERT INTO mydataset.mytable (col1, col2, col3) VALUES (%s, %s, %s)
'''
seq_of_parameters = [
("a", 1, True),
("b", 2, False),
]
warehouse.execute_many(
insert_operation,
seq_of_parameters=seq_of_parameters
)
Marvin
11/26/2024, 9:52 AMNone
values into BigQuery via Prefect, which expects None
to be handled as a NULL
. However, BigQuery needs explicit handling of NULL
values because None
is not automatically converted during insertions.
Here's a solution to maintain a bit of sanity:
Before executing the insert_operation
, you can preprocess your seq_of_parameters
to replace None
with a NULL
literal value that BigQuery can understand. Here’s how you might modify your existing code:
python
seq_of_parameters = [
("a", 1, True),
(None, 2, False),
]
# Replace None with BigQuery NULL representation
processed_parameters = [
(val if val is not None else 'NULL' for val in params)
for params in seq_of_parameters
]
warehouse.execute_many(
insert_operation,
seq_of_parameters=processed_parameters
)
I’ve solidly rested this logic on the Prefect documentation's style with a few added twists. You can find more on handling query parameters in the Prefect documentation.
Now, if you encounter any funky errors or there's smoke where there shouldn't be, feel free to send more my way!Jørgen Nygaard
11/26/2024, 9:58 AMMarvin
11/26/2024, 9:58 AM'NULL'
as a string won't cut it for non-string data types like integers and dates. Instead, you should use Python's None
in your parameter tuples. The BigQuery client should automatically convert None
to NULL
when inserting row data. If that autofill isn’t happening, you might need to explicitly set the type using a helper function or handle this conversion logic right before insertion.
However, based on what you've described, and assuming the BigQuery adapter should handle these conversions (and it seems it should, according to Prefect's documentation), it's possible that there's an issue with how parameters are passed or how the library version is interacting with BigQuery.
Here's an alternative way to handle this that leverages parameterized queries. First, ensure you're using parameter substitution with None
as intended:
python
seq_of_parameters = [
("a", 1, True),
(None, 2, False),
]
insert_operation = '''
INSERT INTO mydataset.mytable (col1, col2, col3) VALUES (%s, %s, %s)
'''
In this setup, make sure your Prefect and Google libraries are up-to-date, as parameter handling issues can sometimes be resolved in newer releases.
If the issue persists, double-check the compatibility of your database driver and Prefect's BigQuery integration. This Prefect-GCP documentation might also provide additional insights, especially if there's some discrepancy with your current library versions.
And remember, if at first you don't succeed, there's always another approach—or possibly bug report—to consider!