I made a mistake in fetching the hierarchy with MultiIndex of pandas

MultiIndex of pandas is convenient, but I was addicted to simply treating it as a multidimensional version of Index, so make a note of it.

I was addicted to

As an example, assume that the following table exists as hoge.csv.

		val
1	a	b
2	c	d
3	a	d
4	b	c
5	a	b

If you read the columns other than val in hoge.csv as index, it will be read as DataFrame of MultiIndex.

>>> import pandas as pd
>>> df = pd.read_csv("hoge.csv", index_col=[0, 1])

>>> df
    val
1 a   b
2 c   d
3 a   d
4 b   c
5 a   b

Try filtering this appropriate DataFrame with val

>>> tmp_df = df.query("val=='b'")

>>> tmp_df.index
MultiIndex([(1, 'a'),
            (5, 'a')],
           )

Two elements were extracted from the DataFrame of all five elements.

Furthermore, if you get the 0th layer of the levels property for the filtered result, you might get1,5 ...

>>> tmp_df.index.levels[0]

Int64Index([1, 2, 3, 4, 5], dtype='int64')

Regardless of the filter ** The elements of the 0th layer of the original DataFrame are fetched ** This is a problem because there are times when you want to retrieve the values in each layer after setting conditions for the values in the table and filtering them.

Solution

Levels is just a list that stores the elements included in each level, and it seems that it is realized by combining by defining the relationship between each level.

Therefore, cancel MultiIndex to make it a single Index that leaves the hierarchy that you want to finally retrieve, and then apply a filter.

>>> df.reset_index(level=1)
  level_1 val
1       a   b
2       c   d
3       a   d
4       b   c
5       a   b
>>> tmp_df = df.reset_index(level=1).query("val=='b'")
>>> tmp_df.index
Int64Index([1, 5], dtype='int64')

If you do this, the Index will be the same as the element of the filter, so if you want to retrieve a certain hierarchy after filtering, you have to correspond with reset_index as described above.

In reset_index, if the column name of MultiIndex is dropped, specify the name, and if not, specify the number of the hierarchy to be released in the argument oflevel =.

Recommended Posts

I made a mistake in fetching the hierarchy with MultiIndex of pandas

I made a GAN with Keras, so I made a video of the learning process.

I made a program to check the size of a file in Python

I made an appdo command to execute a command in the context of the app

I made a twitter app that decodes the characters of Pricone with heroku (failure)

I made a simple typing game with tkinter in Python

I made a dot picture of the image of Irasutoya. (part1)

I made a dot picture of the image of Irasutoya. (part2)

I made a class to get the analysis result by MeCab in ndarray with python

I made a fortune with Python.

I made a daemon with Python