When transferring data from s3 to Google Cloud Storage using Python, I wrote the following two methods before, but after all using gsutil, I copy directly from s3 to Google Cloud Storage without going through my server It is convenient because you can do it.
-Copy data from Amazon S3 to Google Cloud Storage with Python (boto) -Access Google Cloud Storage from Python (boto) using service account and key file (p12)
Refer to Documentation and create a configuration file to be read when GSUtil is executed. At a minimum, you need [Credentials] and [GS Util].
[Credentials]
gs_service_key_file = /path.to/sample-KEYFILE.p12
gs_service_client_id = [email protected]
aws_access_key_id = AXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = sampleawssecretaccesskey1234
[Boto]
https_validate_certificates = True
[GSUtil]
content_language = ja
default_api_version = 2
default_project_id = sampleproject-994
There are ʻos.system, commands, and subprocess to execute commands, but it is recommended to use subprocess`.
and commands` are deprecatedos.system
The subprocess module provides more powerful functionality for running new processes and getting results. It is recommended to use the subprocess module instead of this function. [^ 1]
commands
Deprecated in version 2.6: The commands module has been removed in Python 3.0. Use the subprocess module instead. [^ 2]
subprocessimport os
import subprocess
import shlex
BOTO_PATH = '/path.to/boto.ini'
cmd = 'gsutil cp s3://bucket/name gs://bucket/name'
popen = subprocess.Popen(
shlex.split(cmd),
stdout=subprocess.PIPE,
env={'BOTO_PATH': BOTO_PATH, 'PATH': os.getenv('PATH')})
output = popen.communicate()[0]
print output
The point is the specification of BOTO_PATH. Normally, gsutil looks at the .boto configuration file in your home directory and executes it, but if you specify environment variables such as BOTO_PATH and BOTO_CONFIG, it will go to the file specified there.
Since subprocess.Popen can specify an environment variable in the argument ʻenv, specify'BOTO_PATH' in the dict key and put the path of the configuration file in the value. When specifying ʻenv, all necessary environment variables must be given, so'PATH' is also specified.
If you give> env as a specific value, you must give all the variables needed to run the program. In order to perform a side-by-side assembly on Windows, env must include the correct SystemRoot. [^ 3]
Recommended Posts