Copy S3 files from Python to GCS using GSUtil

When transferring data from s3 to Google Cloud Storage using Python, I wrote the following two methods before, but after all using gsutil, I copy directly from s3 to Google Cloud Storage without going through my server It is convenient because you can do it.

-Copy data from Amazon S3 to Google Cloud Storage with Python (boto) -Access Google Cloud Storage from Python (boto) using service account and key file (p12)

gsutil config file

Refer to Documentation and create a configuration file to be read when GSUtil is executed. At a minimum, you need [Credentials] and [GS Util].

[Credentials]
gs_service_key_file = /path.to/sample-KEYFILE.p12
gs_service_client_id = [email protected]
aws_access_key_id = AXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = sampleawssecretaccesskey1234

[Boto]
https_validate_certificates = True

[GSUtil]
content_language = ja
default_api_version = 2
default_project_id = sampleproject-994

Execute command from Python

There are ʻos.system, commands, and subprocess to execute commands, but it is recommended to use subprocess`.

ʻOs.system and commands` are deprecated

os.system

The subprocess module provides more powerful functionality for running new processes and getting results. It is recommended to use the subprocess module instead of this function. [^ 1]

commands

Deprecated in version 2.6: The commands module has been removed in Python 3.0. Use the subprocess module instead. [^ 2]

Use subprocess

import os
import subprocess
import shlex

BOTO_PATH = '/path.to/boto.ini'

cmd = 'gsutil cp s3://bucket/name gs://bucket/name'

popen = subprocess.Popen(
    shlex.split(cmd),
    stdout=subprocess.PIPE,
    env={'BOTO_PATH': BOTO_PATH, 'PATH': os.getenv('PATH')})

output = popen.communicate()[0]

print output

The point is the specification of BOTO_PATH. Normally, gsutil looks at the .boto configuration file in your home directory and executes it, but if you specify environment variables such as BOTO_PATH and BOTO_CONFIG, it will go to the file specified there.

Since subprocess.Popen can specify an environment variable in the argument ʻenv, specify'BOTO_PATH' in the dict key and put the path of the configuration file in the value. When specifying ʻenv, all necessary environment variables must be given, so'PATH' is also specified.

If you give> env as a specific value, you must give all the variables needed to run the program. In order to perform a side-by-side assembly on Windows, env must include the correct SystemRoot. [^ 3]

Recommended Posts

Copy S3 files from Python to GCS using GSUtil
Copy files directly from Amazon EC2 (Amazon linux) to S3
Recursively copy files from the directory directly under the directory using Python
From Python to using MeCab (and CaboCha)
Dump BigQuery tables to GCS using Python
Copy data from Amazon S3 to Google Cloud Storage with Python (boto)
I want to email from Gmail using Python.
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Push notifications from Python to Android using Google's API
[Python] Local → Procedure for uploading files to S3 (boto3)
MessagePack-Call Python (or Python to Ruby) methods from Ruby using RPC
How to download files from Selenium in Python in Chrome
Get files from Linux using paramiko and scp [Python]
Query from python to Amazon Athena (using named profile)
Flatten using Python yield from
Post from Python to Slack
[S3] CRUD with S3 using Python [Python]
Cheating from PHP to Python
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
Start to Selenium using python
Switch from python2.7 to python3.6 (centos7)
Connect to sqlite from python
Convert from Pandas DataFrame to System.Data.DataTable using Python for .NET
Allow Python to select strings in input files from folders
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
Call Matlab from Python to optimize
How to install python using anaconda
Notes on using MeCab from Python
Post from python to facebook timeline
How to get followers and followers from python using the Mastodon API
[Lambda] [Python] Post to Twitter from Lambda!
[python] Copy script to generate copy log
Connect to utf8mb4 database from python
Python (from first time to execution)
How to access wikipedia from python
Python to switch from another language
Regularly upload files to Google Drive using the Google Drive API in Python
Run Ansible from Python using API
Precautions when using phantomjs from python
Access spreadsheets using OAuth 2.0 from Python
Try using Amazon DynamoDB from Python
Extract strings from files in Python
Did not change from Python 2 to 3
Update Python on Mac from 2 to 3
From preparation for morphological analysis with python using polyglot to part-of-speech tagging
How to deal with OAuth2 error when using Google APIs from Python
Create a tool to automatically furigana with html using Mecab from Python3
Try using the Python web framework Django (1)-From installation to server startup
How to get a value from a parameter store in lambda (using python)
[AWS] Using ini files with Lambda [Python]
[Python] Fluid simulation: From linear to non-linear
Save files using EC2 storage without using S3
Play audio files from Python with interrupts
Introduction to Discrete Event Simulation Using Python # 1
How to update Google Sheets from Python
PUT gzip directly to S3 in Python
Send a message from Python to Slack
Private Python handbook (updated from time to time)
Convert from katakana to vowel kana [python]
Push notification from Python server to Android