Valentin Cathelain
10/31/2023, 12:27 AMbigquery_load_cloud_storage
: Provided Schema does not match.
I'm tried to load a CSV, with the first row as a header. I thought there was a autodetect feature by default, but it doesn't seem to be working. So I tried to specify the schema, but it appears that I don't use the correct syntax .
in api_to_bq SchemaField("stationcode", field_type="STRING"),
NameError: name 'SchemaField' is not defined
Anyone who already used than function could share his experience ?
#### My flow looks like this :
@flow(retries=3, retry_delay_seconds=5, log_prints=True)
def api_to_bq():
extracted_data = extract_data()
file_created_path = store_data(extracted_data)
print(file_created_path)
gcp_cred = GcpCredentials.load('bq-credentials')
schema = [
SchemaField("stationcode", field_type="STRING"),
SchemaField("name", field_type="STRING"),
SchemaField("is_installed", field_type="BOOLEAN"),
SchemaField("capacity", field_type="INTEGER"),
SchemaField("numdocksavailable", field_type="INTEGER"),
SchemaField("numbikesavailable", field_type="INTEGER"),
SchemaField("mechanical", field_type="INTEGER"),
SchemaField("ebike", field_type="INTEGER"),
SchemaField("is_renting", field_type="BOOLEAN"),
SchemaField("is_returning", field_type="BOOLEAN"),
SchemaField("duedate", field_type="TIMESTAMP"),
SchemaField("lon", field_type="FLOAT64"),
SchemaField("lat", field_type="FLOAT64"),
SchemaField("nom_arrondissement_communes", field_type="STRING"),
SchemaField("code_insee_commune", field_type="STRING"),
SchemaField("created_at", field_type="TIMESTAMP")
]
result = bigquery_load_cloud_storage(
dataset=destination_dataset,
table=destination_table,
uri=f'''<gs://staging/{file_created_path}''>',
schema=schema,
gcp_credentials=gcp_cred
)
return result
if __name__ == "__main__":
api_to_bq.serve(name='velib_api_to_bigquery')
Valentin Cathelain
10/31/2023, 12:58 AMfrom google.cloud.bigquery import SchemaField