In this article How to read an Excel file (.xlsx) with Pandas I will write an article about.
From here on, we will proceed using the jupyter lab. If you haven't downloaded it yet, please do so first.
First, import Pandas.
import pandas as pd
Next is reading the Excel file. Excel is similar to reading a CSV file. See the article below for CSV files. How to read CSV files with Pandas --Qiita
df = pd.read_excel('excel/data.xlsx')
df
I read a file called data.xlsx in a folder called excel.
In Excel, the upper line may be blank. For example, if the first two lines are blank, the columns name will be'unnamed' and the first line will be'NaN'. So, let's pass 2 to the argument of "skiprows", skip the first two lines, and read from the third line.
df = pd.read_excel('excel/data.xlsx', skiprows = 2)
If you change the argument of skiprows, you can read from your favorite line.
You can also explicitly specify the header if you have a columns name. For example, when specifying the third line as the columns name,
df = pd.read_excel('excel/data.xlsx', skiprows = 2, header = [2])
will do. It should be noted here that when specifying the third line, do not write'[3]'. The programming array starts with '0', so when you specify the third line, write'[2]'.
By the way, the initial value of the header is 0, so if you do not write anything, the first line will be the columns name.
You can also give two headers. Pass it in list format as an argument, such as [1,2].
df = pd.read_excel('excel/data.xlsx', skiprows = 2, header = [1,2])
For files without a header, if you give None to the header, the columns name will be automatically numbered.
df = pd.read_excel('excel/data02.xlsx', header = None)
** Specify index ** You can also specify an index. Give index_col the number of the column you want to specify. When specifying the first column as index,
df = pd.read_excel('excel/data02.xlsx', index_col = 0)
Now you can specify the first column as index. Again, don't forget that the array starts at 0.
index_col can also be specified by column name instead of column number. For example, if you want to specify the column'date' as index,
df = pd.read_excel('excel/data02.xlsx', index_col = 'date')
will do.
By the way, as it is, the data type is read as Index, but to read as date,
df = pd.read_excel('excel/data02.xlsx', index_col = 'date', parse_dates = True)
Then you can read it as a date. If you check the data type, it will be “DatatimeIndex”.
This time, How to read an Excel file (.xlsx) with Pandas I wrote an article about.
There are a lot of things that I don't understand, but I think I managed to grasp them to some extent.
I think that it will be necessary to automate Excel in the future, so I would like to remember it.
Thank you very much.
This article was written by a programming beginner and may be incorrect. Thank you for your understanding. Also, if you notice any mistakes, we would appreciate it if you could point them out. Thank you.
Recommended Posts