Export Query Result from BigQuery to Google Cloud Storage using Python
This tutorial guides you to programmatically export your query results to Google Cloud Storage.
BigQuery is a fully managed, petabyte-scale data warehousing solution with built-in machine learning capabilities offered by the Google Cloud Platform. It supports the blazing-fast analysis of complex data with deeply nested structures. BigQuery protects your data by performing encryption at rest. Furthermore, you can protect your sensitive information in BigQuery using AEAD Encryption techniques on your desired columns.
The Problem
While exporting data from BigQuery to GCS is easy, it can become a challenge when dealing with large datasets. In addition, exporting data manually every time can be time-consuming, and this is where automation comes in. This article will explore how to export query results from BigQuery to GCS using Python, which provides an automated solution.
Solution:
Python is a popular programming language that provides a wide range of libraries for working with various data storage systems. The Google Cloud Python Client Library is one such library that provides a comprehensive set of tools to work with Google Cloud services like BigQuery and GCS. The following are the steps to export query results from BigQuery to GCS using Python:
- Authenticate with Google Cloud:
The first step is to authenticate with the Google Cloud Platform (GCP) using a service account key. This key is used to access GCP services programmatically. You can create a service account and generate a key in the GCP console.
from google.cloud import bigquery
from google.cloud import storage
# Authenticate with Google Cloud using a service account key
client = bigquery.Client.from_service_account_json('<path/to/service_account_key>')
storage_client = storage.Client.from_service_account_json('<path/to/service_account_key>')
2. Define Query and Destination URI:
Next, define the BigQuery SQL query to retrieve the data and the destination URI where the query results will be stored in GCS.
# Define the BigQuery SQL query to retrieve the data
query = """
SELECT *
FROM `bigquery-public-data.samples.shakespeare`
LIMIT 100
"""
# Define the destination URI where the query results will be stored in GCS
bucket_name = 'my-bucket'
blob_name = 'shakespeare.csv'
destination_uri = f'gs://{bucket_name}/{blob_name}'
3. Run the Query and Export the Results:
After defining the query and the destination URI, run the query using the to_csv
method of the bigquery.Table
class. This method exports the query results to the specified GCS URI in CSV format.
# Run the query and export the results to GCS
query_job = client.query(query)
destination_blob = storage_client.bucket(bucket_name).blob(blob_name)
destination_blob.content_type = 'text/csv'
query_job.result().to_dataframe().to_csv(destination_blob.open('w'), index=False)
4. Verify the Export:
Finally, verify that the query results are exported to GCS by checking the GCS bucket and the exported file.
# Verify that the query results are exported to GCS
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.get_blob(blob_name)
print(f'The query results are exported to {blob.public_url}')