Run BigQuery from Lambda

Introduction

Issue a BigQuery query from Lambda. It is the investigation record. I'm basically using AWS, but I had the opportunity to refer to GCP's BigQuery on a regular basis. I thought it would be convenient to run it on Lambda easily.

Environmental overview

Use the GCP SDK from Lambda's Python. Keep the GCP SDK in layers. You need to set up GCP authentication on the AWS side.

スクリーンショット 2019-11-16 17.55.25.png

The SDK is the Python client library. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html

procedure

Prerequisites

You have an AWS account. I have a GCP account. You can use BigQuery from the API. BigQuery table has been created. I have an AWS access key

Create a Lambda function.

Python code that only runs BigQuery. First, let's move this.

--The Lambda function settings are as follows.

import json
from google.cloud import bigquery

def lambda_handler(event, context):
    client = bigquery.Client()
    sql = """
        SELECT *
        FROM `<my-project>.<my-dataset>.<my-table>`
        LIMIT 10
    """
    
    # Run a Standard SQL query using the environment's default project
    results = client.query(sql).result()
    for row in results:
        print(row)

    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Create an SDK for GCP to register with the Lambda layer.

Add the SDK to the layer to use ʻimport bigquery` in Lambda's Python. Get it with pip and zip it. Here are the steps to boot Linux on a Spot Instance on EC2 and put it on S3. Quickly.

--Create Amazon Linux 2 with a Spot Instance of EC2. ――Small specs are enough. --The IAM role grants only ʻAmazon EC2Role for SSM` added. To connect with the Session Manager of Systems Manager. --Security groups are unounded. --No key pair.

--Once the instance is launched, connect from the Systems Manager session manager.

The execution procedure is described. For <>, set your own value.

# ec2-Become a user
sudo su - ec2-user
#pip installation
sudo yum install python3 -y
curl -O https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py

export PATH=$PATH:/usr/local/bin
#sdk installation&Zip
pip install google-cloud-bigquery -t ./python/
zip -r google-cloud-bigquery.zip python
#Added because protobuf is required
pip install protobuf --upgrade -t ./python/
zip -r google-cloud-bigquery.zip ./python/google/protobuf
#aws cli settings
aws configure
#Set the following:
  AWS Access Key ID [None]: <my-access-key>
  AWS Secret Access Key [None]: <my-secret-key>
  Default region name [None]: ap-northeast-1
  Default output format [None]: json
#Save to s3
aws s3 mb s3://<my-bucket>
aws s3 cp google-cloud-bigquery.zip s3://<my-bucket>

After saving the SDK to s3, you can delete the spot instance.

Register the created library in the Lambda layer.

Return to Lambda.

--Create a layer. スクリーンショット 2019-11-24 11.18.55.png

The runtime has added Python 3.7 and Python 3.8.

--Add a layer to the function. スクリーンショット 2019-11-24 11.11.29.png

--Select a layer and press Add Layer.

Select the name from "Customer Layer". Select the version you created. スクリーンショット 2019-11-24 11.22.41.png

--Layer has been added. image.png

If you add a layer, you can safely delete the S3 file.

Get a GCP certificate file.

--You need to add authentication.

https://cloud.google.com/docs/authentication/production

--Create a json service account key.

From the Go to the Create Service Account Key page in the link above. I chose "BigQuery Administrator" as the role. image.png

Register the GCP authentication file with Lambda.

json added the text from New File by copy and paste. Add the environment variable GOOGLE_APPLICATION_CREDENTIALS.

image.png

Test run

I was able to run a test from the Lambda console!

Clogged points

Without protobuf, I got an error and was in trouble ... I searched for a case on StackOverflow and solved it.

in conclusion

I'm wondering if this is all right, but I posted it because I was able to do it!

--Should the SDK be placed directly under python or in site-packages? I put it directly under Python so that the version of Python is not fixed.

――Is this the way to make the SDK? Where to add to zip

――Can you hide the GCP authentication file more? In environment variables, KMS, parameter stores, etc.

Recommended Posts

Run BigQuery from Lambda
Run python from excel
Run mysqlclient on Lambda
Run illustrator script from python
Query Athena from Lambda Python
lambda
[Lambda] [Python] Post to Twitter from Lambda!
Run Aprili from Python with Orange
Python error detection run from Powershell
Run Python scripts synchronously from C #
Run Systems Manager from Lambda to get a backup of EC2
Operate Dynamodb from Lambda like SQL
Run Ansible from Python using API
Run Python Scripts from Cisco Memorandum_EEM
Modules cannot be imported in Python on EC2 run from AWS Lambda
Run Cloud Dataflow (Python) from App Engine
Run Keras with CNTK backend from CentOS
4 ways to update diffs from spreadsheets to BigQuery
Use BigQuery from your local Jupyter Notebook
[Python] Run Headless Chrome on AWS Lambda