"Loc" can extract rows and columns that meet the conditions in the DataFrame. "Loc" often appears when using pandas, but since there are variations in the data specification method, I would like to summarize that area.
The following data can be specified for loc.
--Single label --Label list --Label slice object --List of boolean values --Specification of conditional expression
There are many ways to use it ... (゜ _ ゜) You need to be careful when writing a program, but if you do not calmly distinguish which pattern is implemented when reading, it is likely to be "???". I will write each sample code and check the operation.
The data used for the operation check this time was created by myself.
import pandas as pd
loc_sample_data = pd.read_csv("loc_sample_data.csv",index_col="item_name")
loc_sample_data.head()
The row index consists of item_name and the columns consist of price, stock and producing_area.
Extract the data by specifying the index label (single) of the row you want to extract. This time, we will extract item C.
loc_sample_data.loc["itemC"]
I was able to extract it. The extracted data is of type Series.
The above example extracts only a single line, but it is possible to specify / extract multiple lines. If you want to specify more than one, specify them in the list. Next, we will extract itemA and itemD.
loc_sample_data.loc[["itemA", "itemD"]]
I was able to extract it. The extracted data was of type DataFrame.
It is also possible to extract data by specifying the label for each row and column. This time, specify row → itemB and column → producing_area to extract data.
loc_sample_data.loc["itemB", "producing_area"]
I was able to extract it. Extracted data str type. In this example, it is the extracted data str type, but this depends on the contents of the data stored in the DataFrame.
You can specify multiple rows and columns using slices. Use this to extract the prices of itemA and itemB.
loc_sample_data.loc["itemA":"itemB","price"]
I was able to extract it. Do you use this ...?
By specifying a boolean list with the same length (number of rows) as the source data frame, only True rows can be extracted. This time, I will extract itemB and itemD.
loc_sample_data.loc[[False, True, False, True]]
I was able to extract it. It seems that there is no chance to use this if it is a single shot, but it seems that there is a use if you judge in advance whether the extraction condition is satisfied for each line and create a list.
It's the one you're most likely to use. This time, I will try to extract data (itemC, itemD) whose price is greater than 500.
loc_sample_data.loc[loc_sample_data["price"] > 500]
I was able to extract it. After all, if this is a single unit, it seems to be the most used.
In addition to the conditional expression above, specify a specific column to extract. The conditions are the same as before, but this time we will only extract the producing_area column.
loc_sample_data.loc[loc_sample_data["price"] > 500, ["producing_area"]]
There are many ways to use it, but the one that you should definitely learn is data extraction using conditional expressions. I'm a little long and tired this time, so I'll finish it. See you in the next post!
Recommended Posts