Upload data to s3 of aws with a command and update it, and delete the used data (on the way)

Install the latest version of the AWS CLI pip3 install awscli --upgrade --user

It seems that it was installed, but aws --version Then zsh: command not found: aws Will come out ... Add the AWS CLI version 1 executable to the macOS command line path (https://docs.aws.amazon.com/ja_jp/cli/latest/userguide/install-macos.html#awscli-install-osx- path) Based on the above article, "Add the aws program to the PATH environment variable of the operating system". → But after all zsh: command not found: aws Will come out ...?

which python It comes up as / Users / user name / anaconda3 / bin / python.

Install AWS CLI According to the article above If you use --user during the first installation, it will be installed in .local. I need to put my PATH in ~ / .local / bin, but now

ʻExport PATH =" / Users / username / anaconda3 / bin: $ PATH "` I wonder if it is because it has become ...

Is the method different when using anaconda? how to install AWSCLI on a Anaconda python distribution aws codecommit aws: command not found

→ Conclusion

conda install -c conda-forge awscli After running, I was able to use AWS commands brilliantly.

AWS CLI settings

Reference site aws configure ↓ AWS Access Key ID [None]: 〜〜 AWS Secret Access Key [None]: 〜〜 Default region name [None]: ap-northeast-1 Default output format [None]: json

AWS CLI Command List Confirm that you can see the contents of s3 with ʻaws s3 ls`

Copy the files on s3 locally.

ʻAws s3 cp s3: // {bucket name} / {path} {local path} `

You can copy the path of the file on s3 by clicking "Click File-> Click Copy Path". (The following is when downloading to the download folder) ʻAws s3 cp s3: // ~ ~ ~ / Users / user name / Downloads `

Unzip the file with python and remove the extra columns

Reference: You can read compressed files with pandas.read_csv. Very convenient!

`python`


import pandas as pd
df = pd.read_csv('file name.csv.gz')

#Delete unused columns
df=df.drop(columns=['A','B','C'],axis=1)

#Delete any missing values in column a
df=df.dropna(subset=['a'])

df.to_csv('./renamed_file/File name after compression.csv.gz', index=False, compression='gzip')

Execute the command from python.

[Introduction to Python] Let's execute commands using subprocess!

As a test, take a look at the contents of s3 from python

`python`


import subprocess

subprocess.call(["aws","s3","ls"])

→ Success

Download from S3 to the download / point_data folder, decompress it, add processing, recompress it to the download / renamed_file folder, place it, and upload it to the specified location in S3.

`python`


import subprocess
path_list=[List of PATH files on S3 you want to download]
for s in range(len(path_list)):
    cmd="aws s3 cp"+" "+path_list[s]+" "+"/Users/username/Downloads/point_data"
    subprocess.call(cmd.split())
    df = pd.read_csv(filename_list[s])
    #Delete unused columns
    df=df.drop(columns=df.columns[[1,2,3,4,5]], axis=1)#Numbers are column numbers Left is an example
    
    #Delete row without column name A
    df=df.dropna(subset=['A'])
    #Save
    file_name='/Users/username/Downloads/renamed_file/'+file name
    df.to_csv(file_name, index=False, compression='gzip')
    cmd2="aws s3 cp"+" "+"/Users/username/Downloads/renamed_file/"+file name+" "+ 'PATH of location on S3 you want to upload'
    subprocess.call(cmd2.split())