https://prefect.io logo
Title
w

wei Liu

03/20/2023, 7:59 AM
Hi, I'm trying to load csv files from gcs to bigquery. I used
bigquery_load_file
method, but it can't find the dataset in bq. I am sure I have created the dataset.
Encountered exception during execution:
Traceback (most recent call last):
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 2452, in load_table_from_file
    response = self._do_resumable_upload(
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 2869, in _do_resumable_upload
    upload, transport = self._initiate_resumable_upload(
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 2942, in _initiate_resumable_upload
    upload.initiate(
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/resumable_media/requests/upload.py", line 420, in initiate
    return _request_helpers.wait_and_retry(
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/resumable_media/requests/_request_helpers.py", line 148, in wait_and_retry
    response = func()
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/resumable_media/requests/upload.py", line 416, in retriable_request
    self._process_initiate_response(result)
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/resumable_media/_upload.py", line 509, in _process_initiate_response
    _helpers.require_status_code(
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/resumable_media/_helpers.py", line 108, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.CREATED: 201>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/prefect/engine.py", line 1438, in orchestrate_task_run
    result = await task.fn(*args, **kwargs)
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/prefect_gcp/bigquery.py", line 513, in bigquery_load_file
    result = await to_thread.run_sync(partial_load)
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/prefect_gcp/bigquery.py", line 40, in _result_sync
    result = func(*args, **kwargs).result()
  File "/Users/peter/opt/anaconda3/envs/zoom/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 2460, in load_table_from_file
    raise exceptions.from_http_response(exc.response)
google.api_core.exceptions.NotFound: 404 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/de-zoomcamp-378315/jobs?uploadType=resumable>: Not found: Dataset de-zoomcamp-378315:who_disease_data_tf
Here is my code:
@flow
def load_data_to_bq():

    gcp_credentials_block = GcpCredentials.load("zoomcamp-2")
    day = date.today().strftime('%Y%m%d')
    for disease in ['monkeypox', 'covid-19']:
    
        result = bigquery_load_file(
            dataset='who_disease_data_tf',
            table=disease,
            path=f"data/{disease}_{day}.csv",
            gcp_credentials=gcp_credentials_block
        )

    return result