When you put the data in Amazon S3 (hereinafter S3) into RedShift, you can easily do it because it is the same AWS, but when you do the same thing with BigQuery, it is still via Google Cloud Storage (hereinafter GCS) Convenient. GCS can be compatible with S3, which you can use to migrate from S3 to GCS and vice versa.
This article describes how to do that using Python and boto.
For applications such as performing a copy on the command line, the following better methods can be used.
gsutil
Usually, the easiest way to copy data between S3 and GCS is to use the gsutil
command.
https://cloud.google.com/storage/docs/gsutil
$ gsutil cp s3://bucket/file gs://bucket/file
GCS compatibility from the GCS management console Enable operational access. (Reference article: Upload to Google Cloud Storage with AWS CLI or AWS SDK for PHP)
The following information is required for S3 and GCS, so obtain them in advance. Of course, the writer needs write permission.
--bucket name
boto
Install it to use boto.
$ pip install boto
Thankfully, boto can read and write to both S3 and GCS (GCS-related classes are implemented by inheriting S3-related classes). Use it to read and write on Python.
bucket2bucket.py
from boto.gs.connection import GSConnection
from boto.s3.connection import S3Connection
gs_bucket = GSConnection(
'GS_ACCSESS_KEY', 'GS_SECRET_ACCSESS_KEY').get_bucket('GS_BUCKET_NAME')
s3_bucket = S3Connection(
'S3_ACCSESS_KEY', 'S3_SECRET_ACCSESS_KEY').get_bucket('S3_BUCKET_NAME')
bucket2bucket.py
from StringIO import StringIO
def bucket2bucket(from_bucket, to_bucket, file_name):
io = StringIO()
try:
from_bucket.get_key(file_name).get_file(io)
io.seek(0)
key = to_bucket.new_key(key_name=file_name)
key.set_contents_from_file(io, replace=True) #replace allows overwrite
finally:
io.close()
Other arguments such as get_file
and set_contents_from_file
can be specified, so it is recommended to check the documentation.
bucket2bucket.py
bucket2bucket(s3_bucket, gs_bucket, 'spam')
bucket2bucket(gs_bucket, s3_bucket, 'egg')
In this way, boto can handle S3 and GCS in the same way, so you can easily exchange data with each other. In addition, Bucket.copy_key
is prepared for copying in the same bucket, so it is recommended to use that.
Recommended Posts