.ix deprecated

It's a useful .ix when referencing a pandas DataFrame, but if you don't understand it, you'll fall into a trap.

For example, suppose you have the following DataFrame.

from pandas import DataFrame

df = DataFrame(
    [['a0', 'b0'], ['a1', 'b1'], ['a2', 'b2']],
    index=[2, 4, 6],
    columns=['a', 'b'])

	a	b
2	a0	b0
4	a1	b1
6	a2	b2

What if you want to refer to the second (starting from 0) row ʻa column here? The expected result is ʻa2. .ix accepts both order and index, so if you refer to it as below ...

df.ix[2, 'a']

The result will be ʻa0. This is because the reference given a number in .ix` goes to the index if the index exists, and to see the order if it does not exist in the index.

This ambiguity in the .ix reference seems to be deprecated in pandas version 0.20.

To refer to the sequence and index name at the same time

It may not be the best solution, but I will try to refer to it by .iloc. However, this only allows references in order, so use pandas.Index.get_loc together. This is a method that looks up a row name (or column name) and returns the order.

df.iloc[2, df.columns.get_loc('a')]

The expected result, ʻa2`, is now returned.

In the above example, the column name is specified, but when specifying the row name, do as follows.

df.iloc[df.index.get_loc(6), 0]

If you know the row and column names in advance, you can just use .loc normally.

df.loc[6, 'a']

Similarly, .iloc is fine if you know the row and column order in advance.

df.iloc[2, 0]

Summary

If you do not understand the behavior of .ix, it will behave unintentionally. Even if you understand the behavior, you need to know the contents of the index, so it seems better to avoid using .ix as much as possible.

Recommended Posts

Browse .loc and .iloc at the same time in pandas DataFrame

Loop variables at the same time in the template

Visualize data and understand correlation at the same time

Type conversion of multiple columns of pandas DataFrame with astype at the same time

Is there NaN in the pandas DataFrame?

Plot multiple maps and data at the same time with Python's matplotlib

I want to make a music player and file music at the same time

Turn multiple lists with a for statement at the same time in Python

Check if the expected column exists in Pandas DataFrame

I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②

Python built-in function ~ divmod ~ Let's get the quotient and remainder of division at the same time