Aidemy 2020/10/11
Introduction
Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you!
This is the second post on data handling. Nice to meet you.
- This article is a summary of what you learned in "Aidemy" "in your own words". It may contain mistakes and misunderstandings. Please note.
What to learn this time
・ Introduction of formats that can be converted with pandas
-Data format conversion using pandas
・ Graph CSV file
Data format analysis
File input / output using pandas
-HTML, JSON, CSV, and Excel have different uses such as Web pages, WebAPI, and data organization. You can convert between these data formats using __pandas. __
HTML scraping with pandas
-Basically, HTML tag elements such as \
and \
are scraped with BeautifulSoup, but __table elements \
__ are scraped with pandas.
About JSON
-JSON is an abbreviation of "JavaScript Object Notation" and supports the exchange of data in different programming languages.
-The structure of the JSON file is basically the same as the structure of Python dictionary variables, and is expressed in the form of {key: value,}.
About CSV files
-CSV is "Comma Separated Values", that is, "comma-separated values". Due to its lightweight and simple data structure, it has been used for data exchange since ancient times.
-The CSV file has a structure that is only separated by value, such as "a, b, c,".
About Excel
・ It goes without saying that Excel is spreadsheet software. Since it is widely used, the range of data analysis will expand when Excel scraping becomes possible.
-For each name of Excel, first, the file is called __ "book" __, the table in the file is __ "sheet" __, of which the vertical is __ "column" __ The side is __ "row" __, and each item is called __ "cell" __.
Data format conversion
Read the file with DataFrame
-Actually convert the above-mentioned data format. First, reading the file
_pd.read Data type ("file name") __. For example, HTML is "pd.read_html ()", and Excel is "pd.read_excel ()".
-Write the file with _pd.to data type ("file name") __. Also, here it is "pd", but if you want to write the DataFrame type object "df" to an HTML file, it will be "df.to_html ()".
Graph the data in the CSV file
Graphing procedure
-"Read CSV file (read_csv)" "Create graph with pandas" "Draw graph with matplotlib (plt.show)"
・ Of these, "Create graphs with pandas" is new. The method is OK with __ "df.plot ()" __.
data=pd.read_csv("data.csv")
data.plot()
plt.show()
Summary
-Pandas allows you to exchange data between various data formats.
-When reading or writing other data formats to python, it is expressed as __ "pd.read_csv ()" "df.to_html ()" __.
-The read CSV file can be graphed like __df.plot () __.
This time is over. Thank you for reading until the end.