[Explanation with image] Register an account with a free trial of Google Cloud Platform (GCP)
Install Google Cloud SDK ~ Initialize
Create a project with Google Cloud SDK
$ gsutil is a command to operate storage.
$ gsutil mb -l us-central1 gs://Bucket name
$gsutil cp local file path gs://Bucket name/Directory name/file name
Create a service account / service account key so that you can access GCS from Python.
gcloud iam service-accounts create service account name\
--display-name service account display name\
gcloud projects get-iam-policy mypj-id
# bindings:
# - members:
# - user:[email protected]
# role: roles/owner
# etag: BwWeTrntoao=
# version: 1
Grant storage administrator privileges
gcloud projects add-iam-policy-binding project ID\
--member serviceAccount:Service account name@Project ID.iam.gserviceaccount.com \
--role roles/storage.admin
https://cloud.google.com/iam/docs/understanding-roles?hl=ja#predefined_roles
gcloud projects get-iam-policy mypj-id
# bindings:
# - members:
# - user:[email protected]
# role: roles/owner
# - members:
# - serviceAccount:[email protected]
# role: roles/storage.admin
# etag: BwWeTz6vIBY=
# version: 1
$ gcloud iam service-accounts keys create ./service_account_keys/anata_no_key.json \
--iam-account service account name@Project ID.iam.gserviceaccount.com
.
├── .env
├── service_account_keys/
│ └── anata_no_key.json
└── working/
└── main.py
.env
With this description to set the path of the service account key created earlier in the environment variable. ** * Relative path from the loading source file (file loaded_dotenv ()) **
.env
GOOGLE_APPLICATION_CREDENTIALS=./service_account_keys/anata_no_key.json
Install google-cloud-storage`` python-dotenv
pandas
with pip
$ pip install google-cloud-storage python-dotenv pandas
main.py
import os
from io import BytesIO
from dotenv import load_dotenv
from google.cloud import storage
import pandas as pd
# .Set the contents of env to environment variables
load_dotenv('./.env')
PROJECT_NAME = 'anata_no_project'
BUCKET_NAME = 'anata_no_bucket'
FILE_NAME = 'path/to/dir/train.csv' # gs://Bucket name/~The following path
client = storage.Client(PROJECT_NAME)
bucket = client.get_bucket(BUCKET_NAME)
blob = storage.Blob(FILE_NAME, bucket)
data = blob.download_as_string()
df = pd.read_csv(BytesIO(data))
print(df)
It is OK if it is displayed as df.