HTML is a web page, JSON is a web API CSV and Excel have different main uses such as data organization. Mutual conversion is possible by using the pandas library.
An HTML file is a data format that describes the contents of a web page.
The master of HTML analysis method is all web pages written in HTML It can be the target of analysis. Extracting information from HTML files on the web
This is called scraping.
In python, you can scrape with the library.
pandas library:Scraping table elements in HTML files
Separate libraries such as BeautifulSoup and lxml:Scraping other than table elements
JSON file is an abbreviation for "JavaScript Object Notation" It is a text format originally created by referring to the notation of the programming language "Javascript".
JSON format is a text format independent of Javascript language Because most programming languages support reading and writing It is often used to exchange data between different programming languages.
The structure of a JSON file is basically the same as the structure of Python dictionary variables. Specify key and value pairs in curly braces {}, separated by commas. Place a colon: between the key and the value.
CSV file is an abbreviation for "Comma Separated Values" It is a data format called "comma-separated values".
Because CSV files are saved in text format You can open the data independently of any specific software.
The data structure is simple, there is no extra metadata, and it is lightweight. It has been used for communication between spreadsheet software and database software for a long time.
The structure of the CSV file is very simple, and the values are separated by commas to represent columns. This makes it possible to describe tabular data concisely.
Excel is a spreadsheet software used all over the world Many companies, public institutions and other organizations use this Information is disclosed in Excel file format.
Therefore, it is possible to handle Excel files when collecting and analyzing data using Python. The range of data analysis is greatly expanded.
When handling Excel files with spreadsheet software, it can be operated graphically. You don't have to be so conscious of the structure, Use these terms when working with Excel files from programming languages Remember these keywords to specify what you want to do.
the term | Details |
---|---|
book | Excel file |
sheet | Sheet in the book |
row | line |
column | Column |
cell | cell |
Use the pandas library to create HTML, JSON, CSV, etc. files Use read_ to read.
read_***()
#Use this function to load.
# ***Will contain different characters for each file format.
For HTML files, the read_html () function, In the case of an Excel file, specify it like the read_excel () function.
The pandas library also supports formats other than the file formats listed in the table, as well. It can be read by a function called read_*** (). The loaded file is converted to a DataFrame type object in the pandas library It is possible to perform various processing using the function of pandas
file format | function |
---|---|
HTML | read_html() |
JSON | read_json() |
CSV | read_csv() |
Excel | read_excel() |
For example, if you want to parse HTML files using the pandas library Use the read_html () function in the pandas library. By entering the path or URL of the HTML file you want to parse in the argument of the read_html () function, You can generate a DataFrame type object from a table element in an HTML file.
import pandas as pd
tables = pd.read_html("HTML file you want to parse")
DataFrame object in pandas library Use to_ as a file such as an HTML file, JSON file, or CSV file.
to_***()
#Use this function to export.
# read_***()Like a function***Will contain different characters for each file format
For HTML, the to_html () function, for Excel, the to_excel () function, and so on. The pandas library also supports formats other than the file formats listed in the table, as well. It can be read by a function called to _ *** ().
file format | function |
---|---|
HTML | to_html() |
JSON | to_json() |
CSV | to_csv() |
Excel | to_excel() |
For example, if you want to output to an Excel file using the pandas library Use the to_excel () function in the pandas library. By specifying the name of the Excel file you want to export in the argument of the to_excel () function You can generate an Excel file from an object of type DataFrame.
# pandas.DataFrame type object`df`To output to an Excel file
df.to_excel("Excel file name you want to export")
First, read the data.
import pandas as pd
stock_data=pd.read_csv(Where is the specified csv file?)
# ./~Specify the location of the file, etc.
print(stock_data)
In pandas, you can create a graph using an object of type DataFrame as an index function. Assuming you have an object df of type DataFrame, you can write:
from matplotlib import pyplot as plt
df.plot()
plt.show()
#When only specific data
df = data[price]
df.plot()
plt.show()
#At the time of all data
df = data
df.plot()
plt.show()
#Not specified. You can leave the data as it is
Recommended Posts