Upload data to s3 of aws with a command and update it, and delete the used data (on the way)

Installing AWS CLI version 1

Install the latest version of the AWS CLI pip3 install awscli --upgrade --user

It seems that it was installed, but aws --version Then zsh: command not found: aws Will come out ... Add the AWS CLI version 1 executable to the macOS command line path (https://docs.aws.amazon.com/ja_jp/cli/latest/userguide/install-macos.html#awscli-install-osx- path) Based on the above article, "Add the aws program to the PATH environment variable of the operating system". → But after all zsh: command not found: aws Will come out ...?

which python It comes up as / Users / user name / anaconda3 / bin / python.

Install AWS CLI According to the article above If you use --user during the first installation, it will be installed in .local. I need to put my PATH in ~ / .local / bin, but now

ʻExport PATH =" / Users / username / anaconda3 / bin: $ PATH "` I wonder if it is because it has become ...

Is the method different when using anaconda? how to install AWSCLI on a Anaconda python distribution aws codecommit aws: command not found

→ Conclusion

conda install -c conda-forge awscli After running, I was able to use AWS commands brilliantly.

AWS CLI settings

Reference site aws configureAWS Access Key ID [None]: 〜〜 AWS Secret Access Key [None]: 〜〜 Default region name [None]: ap-northeast-1 Default output format [None]: json

AWS CLI Command List Confirm that you can see the contents of s3 with ʻaws s3 ls`

Copy the files on s3 locally.

ʻAws s3 cp s3: // {bucket name} / {path} {local path} `

You can copy the path of the file on s3 by clicking "Click File-> Click Copy Path". (The following is when downloading to the download folder) ʻAws s3 cp s3: // ~ ~ ~ / Users / user name / Downloads `

Unzip the file with python and remove the extra columns

Reference: You can read compressed files with pandas.read_csv. Very convenient!

python


import pandas as pd
df = pd.read_csv('file name.csv.gz')

#Delete unused columns
df=df.drop(columns=['A','B','C'],axis=1)

#Delete any missing values in column a
df=df.dropna(subset=['a'])

df.to_csv('./renamed_file/File name after compression.csv.gz', index=False, compression='gzip')

Execute the command from python.

[Introduction to Python] Let's execute commands using subprocess!

As a test, take a look at the contents of s3 from python

python


import subprocess

subprocess.call(["aws","s3","ls"])

→ Success

Download from S3 to the download / point_data folder, decompress it, add processing, recompress it to the download / renamed_file folder, place it, and upload it to the specified location in S3.

python


import subprocess
path_list=[List of PATH files on S3 you want to download]
for s in range(len(path_list)):
    cmd="aws s3 cp"+" "+path_list[s]+" "+"/Users/username/Downloads/point_data"
    subprocess.call(cmd.split())
    df = pd.read_csv(filename_list[s])
    #Delete unused columns
    df=df.drop(columns=df.columns[[1,2,3,4,5]], axis=1)#Numbers are column numbers Left is an example
    
    #Delete row without column name A
    df=df.dropna(subset=['A'])
    #Save
    file_name='/Users/username/Downloads/renamed_file/'+file name
    df.to_csv(file_name, index=False, compression='gzip')
    cmd2="aws s3 cp"+" "+"/Users/username/Downloads/renamed_file/"+file name+" "+ 'PATH of location on S3 you want to upload'
    subprocess.call(cmd2.split())
    

Recommended Posts

Upload data to s3 of aws with a command and update it, and delete the used data (on the way)
Process the gzip file UNLOADed with Redshift with Python of Lambda, gzip it again and upload it to S3
[AWS lambda] Deploy including various libraries with lambda (generate a zip with a password and upload it to s3) @ Python
Zip-compress any file with the [shell] command to create a file and delete the original file.
GAE --With Python, rotate the image based on the rotation information of EXIF and upload it to Cloud Storage.
Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS
POST the image selected on the website with multipart / form-data and save it to Amazon S3! !!
Return the image data with Flask of Python and draw it to the canvas element of HTML
A command to easily check the speed of the network on the console
Find the white Christmas rate by prefecture with Python and map it to a map of Japan
[Personal memo] Get data on the Web and make it a DataFrame
Convert the spreadsheet to CSV and upload it to Cloud Storage with Cloud Functions
Use AWS lambda to scrape the news and notify LINE of updates on a regular basis [python]
Read the data of the NFC reader connected to Raspberry Pi 3 with Python and send it to openFrameworks with OSC
Make a thermometer with Raspberry Pi and make it visible on the browser Part 3
A record of the time it took to deploy mysql on Cloud9 + Rails
Scraping the rainfall data of the Japan Meteorological Agency and displaying it on M5Stack
Introduction to Python with Atom (on the way)
I tried to rescue the data of the laptop by booting it on Ubuntu
I wanted to know the number of lines in multiple files, so I tried to get it with a command
[AWS S3] Confirmation of the existence of folders on S3
[Introduction to Python] How to get the index of data with a for statement
[Python3] Take a screenshot of a web page on the server and crop it further
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
It was a life I wanted to OCR on AWS Lambda to locate the characters.
From installing Flask on CentOS to making it a service with Nginx and uWSGI
The story of trying to contribute to COVID-19 analysis with AWS free tier and failing
It was a little difficult to do flask with the docker version of nginx-unit
Try to Normalize Cut the image with scikit-image (although it gets angry on the way)
A memo on how to overcome the difficult problem of capturing FX with AI
I just wanted to extract the data of the desired date and time with Django
Read the csv file with jupyter notebook and write the graph on top of it
Transit to the update screen with the Django a tag
I don't like to be frustrated with the release of Pokemon Go, so I made a script to detect the release and tweet it
In matplotlib, set the vertical axis on the left side of the histogram to frequency and the vertical axis on the right side to relative frequency (maybe a wicked way)
[Golang] Command to check the supported GOOS and GOARCH in a list (Check the supported platforms of the build)
Recursively get the Excel list in a specific folder with python and write it to Excel.
I analyzed the rank battle data of Pokemon sword shield and visualized it on Tableau
How to get the current weather data and display it on the GUI while updating it automatically
Give the history command a date and time and collect the history files of all users with a script
An easy way to view the time taken in Python and a smarter way to improve it
How to insert a specific process at the start and end of spider with scrapy
[Ubuntu] How to delete the entire contents of a directory
A network diagram was created with the data of COVID-19.
Get UNIXTIME at the beginning of today with a command
Let's execute the command on time with the bot of discord
Probably the easiest way to create a pdf with Python3
Build a Python environment and transfer data to the server
Delete all libraries installed on pip with a single command
The story of copying data from S3 to Google's TeamDrive
The usual way to add a Kernel with Jupyter Notebook
A collection of methods used when aggregating data with pandas
Upload and delete files to Google Cloud Storages with django-storage
Try to extract the features of the sensor data with CNN
A program that receives the servo command of the radio control, interrupts the Raspberry Pi and logs it
I tried to make a site that makes it easy to see the update information of Azure
I wrote AWS Lambda, and I was a little addicted to the default value of Python arguments
[For IT beginners] What to do when the rev command cannot be used with Git Bash
I tried to unlock the entrance 2 lock sesame with a single push of the AWS IoT button
[Python] What is pip? Explain the command list and how to use it with actual examples