When transferring data from s3 to Google Cloud Storage using Python, I wrote the following two methods before, but after all using gsutil
, I copy directly from s3 to Google Cloud Storage without going through my server It is convenient because you can do it.
-Copy data from Amazon S3 to Google Cloud Storage with Python (boto) -Access Google Cloud Storage from Python (boto) using service account and key file (p12)
Refer to Documentation and create a configuration file to be read when GSUtil is executed. At a minimum, you need [Credentials] and [GS Util].
[Credentials]
gs_service_key_file = /path.to/sample-KEYFILE.p12
gs_service_client_id = [email protected]
aws_access_key_id = AXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = sampleawssecretaccesskey1234
[Boto]
https_validate_certificates = True
[GSUtil]
content_language = ja
default_api_version = 2
default_project_id = sampleproject-994
There are ʻos.system,
commands, and
subprocess to execute commands, but it is recommended to use
subprocess`.
and
commands` are deprecatedos.system
The subprocess module provides more powerful functionality for running new processes and getting results. It is recommended to use the subprocess module instead of this function. [^ 1]
commands
Deprecated in version 2.6: The commands module has been removed in Python 3.0. Use the subprocess module instead. [^ 2]
subprocess
import os
import subprocess
import shlex
BOTO_PATH = '/path.to/boto.ini'
cmd = 'gsutil cp s3://bucket/name gs://bucket/name'
popen = subprocess.Popen(
shlex.split(cmd),
stdout=subprocess.PIPE,
env={'BOTO_PATH': BOTO_PATH, 'PATH': os.getenv('PATH')})
output = popen.communicate()[0]
print output
The point is the specification of BOTO_PATH. Normally, gsutil
looks at the .boto configuration file in your home directory and executes it, but if you specify environment variables such as BOTO_PATH
and BOTO_CONFIG
, it will go to the file specified there.
Since subprocess.Popen
can specify an environment variable in the argument ʻenv, specify'BOTO_PATH' in the dict key and put the path of the configuration file in the value. When specifying ʻenv
, all necessary environment variables must be given, so'PATH' is also specified.
If you give> env as a specific value, you must give all the variables needed to run the program. In order to perform a side-by-side assembly on Windows, env must include the correct SystemRoot. [^ 3]
Recommended Posts