MultiIndex of pandas is convenient, but I was addicted to simply treating it as a multidimensional version of Index, so make a note of it.
As an example, assume that the following table exists as hoge.csv
.
val | ||
---|---|---|
1 | a | b |
2 | c | d |
3 | a | d |
4 | b | c |
5 | a | b |
If you read the columns other than val in hoge.csv
as index, it will be read as DataFrame of MultiIndex.
>>> import pandas as pd
>>> df = pd.read_csv("hoge.csv", index_col=[0, 1])
>>> df
val
1 a b
2 c d
3 a d
4 b c
5 a b
Try filtering this appropriate DataFrame with val
>>> tmp_df = df.query("val=='b'")
>>> tmp_df.index
MultiIndex([(1, 'a'),
(5, 'a')],
)
Two elements were extracted from the DataFrame of all five elements.
Furthermore, if you get the 0th layer of the levels
property for the filtered result, you might get1,5
...
>>> tmp_df.index.levels[0]
Int64Index([1, 2, 3, 4, 5], dtype='int64')
Regardless of the filter ** The elements of the 0th layer of the original DataFrame are fetched ** This is a problem because there are times when you want to retrieve the values in each layer after setting conditions for the values in the table and filtering them.
Levels is just a list that stores the elements included in each level, and it seems that it is realized by combining by defining the relationship between each level.
Therefore, cancel MultiIndex to make it a single Index that leaves the hierarchy that you want to finally retrieve, and then apply a filter.
>>> df.reset_index(level=1)
level_1 val
1 a b
2 c d
3 a d
4 b c
5 a b
>>> tmp_df = df.reset_index(level=1).query("val=='b'")
>>> tmp_df.index
Int64Index([1, 5], dtype='int64')
If you do this, the Index will be the same as the element of the filter, so if you want to retrieve a certain hierarchy after filtering, you have to correspond with reset_index
as described above.
In reset_index
, if the column name of MultiIndex is dropped, specify the name, and if not, specify the number of the hierarchy to be released in the argument oflevel =
.
Recommended Posts