Copy data from Amazon S3 to Google Cloud Storage with Python (boto)

When you put the data in Amazon S3 (hereinafter S3) into RedShift, you can easily do it because it is the same AWS, but when you do the same thing with BigQuery, it is still via Google Cloud Storage (hereinafter GCS) Convenient. GCS can be compatible with S3, which you can use to migrate from S3 to GCS and vice versa.

This article describes how to do that using Python and boto.

Alternatives to consider

For applications such as performing a copy on the command line, the following better methods can be used.

gsutil

Usually, the easiest way to copy data between S3 and GCS is to use the gsutil command. https://cloud.google.com/storage/docs/gsutil

$ gsutil cp s3://bucket/file gs://bucket/file

Preparation

Compatible operation access

GCS compatibility from the GCS management console Enable operational access. (Reference article: Upload to Google Cloud Storage with AWS CLI or AWS SDK for PHP)

Necessary information

The following information is required for S3 and GCS, so obtain them in advance. Of course, the writer needs write permission.

--bucket name

boto

Install it to use boto.

$ pip install boto

Implementation example

Thankfully, boto can read and write to both S3 and GCS (GCS-related classes are implemented by inheriting S3-related classes). Use it to read and write on Python.

Preparing for bucket operation

bucket2bucket.py


from boto.gs.connection import GSConnection
from boto.s3.connection import S3Connection


gs_bucket = GSConnection(
    'GS_ACCSESS_KEY', 'GS_SECRET_ACCSESS_KEY').get_bucket('GS_BUCKET_NAME')

s3_bucket = S3Connection(
    'S3_ACCSESS_KEY', 'S3_SECRET_ACCSESS_KEY').get_bucket('S3_BUCKET_NAME')

Read / write using files or StringIO

bucket2bucket.py


from StringIO import StringIO


def bucket2bucket(from_bucket, to_bucket, file_name):
    io = StringIO()
    try:
        from_bucket.get_key(file_name).get_file(io)
        io.seek(0)
        key = to_bucket.new_key(key_name=file_name)
        key.set_contents_from_file(io, replace=True)  #replace allows overwrite
    finally:
        io.close()

Other arguments such as get_file and set_contents_from_file can be specified, so it is recommended to check the documentation.

Execution example

bucket2bucket.py


bucket2bucket(s3_bucket, gs_bucket, 'spam')
bucket2bucket(gs_bucket, s3_bucket, 'egg')

In this way, boto can handle S3 and GCS in the same way, so you can easily exchange data with each other. In addition, Bucket.copy_key is prepared for copying in the same bucket, so it is recommended to use that.

Recommended Posts

Copy data from Amazon S3 to Google Cloud Storage with Python (boto)
[GCP] Operate Google Cloud Storage with Python
How to connect to Cloud Firestore from Google Cloud Functions with python code
Data integration from Python app on Linux to Amazon Redshift with ODBC
Implemented in Dataflow to copy the hierarchy from Google Drive to Google Cloud Storage
Data integration from Python app on Windows to Amazon Redshift with ODBC
From python to running instance on google cloud platform
Copy files directly from Amazon EC2 (Amazon linux) to S3
Copy S3 files from Python to GCS using GSUtil
S3 operation with python boto3
How to upload files to Cloud Storage with Firebase's python SDK
Get data from analytics API with Google API Client for python
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
Create folders from '01' to '12' with python
Access Google Cloud Storage from Python (boto) using your service account and key file (p12)
Using Cloud Storage from Python3 (Introduction)
Use boto3 to mess with S3
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
Terminal association from the server side to Amazon SNS (python + boto3)
Upload file to GCP's Cloud Storage (GCS) ~ Load with local Python
Receive textual data from mysql with python
[Note] Get data from PostgreSQL with Python
Convert Excel data to JSON with python
Try using Python with Google Cloud Functions
Use Google Cloud Vision API from Python
Getting started with Dynamo from Python boto
Connect to s3 with AWS Lambda Python
[Amazon Linux] Switching from Python 2 series to Python 3 series
Operate Sakura's cloud object storage from Python
S3 server-side encryption SSE with Python boto3
Upload images to Google Drive with Python
[Cloudian # 3] Try to create a new object storage bucket with Python (boto3)
[Data science basics] I tried saving from csv to mysql with python
How to deal with OAuth2 error when using Google APIs from Python
How to deal with SSL error when connecting to S3 with boto of Python
[Cloudian # 1] Try to access object storage with AWS SDK for Python (boto3)
Csv output from Google search with [Python]! 【Easy】
Data acquisition from analytics API with Google API Client for python Part 2 Web application
Firebase: Use Cloud Firestore and Cloud Storage from Python
How to get the key on Amazon S3 with Boto 3, implementation example, notes
[Python] How to read data from CIFAR-10 and CIFAR-100
I tried to get CloudWatch data with Python
[Python] Flow from web scraping to data analysis
WEB scraping with python and try to make a word cloud from reviews
Copy data between Google Keep accounts in Python
[Python] Summary of S3 file operations with boto3
Export RDS snapshot to S3 with Lambda (Python)
Write CSV data to AWS-S3 with AWS-Lambda + Python
From Python environment construction to virtual environment construction with anaconda
Extract data from a web page with Python
[GCP] [Python] Deploy API serverless with Google Cloud Functions!
Make a copy of a Google Drive file from Python
Easily try Amazon EMR / Cloud Dataproc with Python [mrjob]
Put AWS data in Google Spreadsheet with boto + gspread
Send log data from the server to Splunk Cloud
Easy way to scrape with python using Google Colab
[Python] Local → Procedure for uploading files to S3 (boto3)
Send data from Python to Processing via socket communication
A story linked with Google Cloud Storage with a little ingenuity
Use of Google Cloud Storage (GCS) with "GAE / Py"
I tried to analyze J League data with Python