About pandas

Since pandas is a data operation based on numpy, it is convenient because the operation of numpy can be used as it is. However, it is difficult to understand how to extract rows and columns until you get used to it. I'm still unfamiliar with it, so I'll write it down.

DataFrame and Series

There are two types of data formats in pandas, DataFrame and Series. The former is two-dimensional data and the latter is one-dimensional data. Basically, Series is rarely used, so we will focus on DataFrame. When one column is specified and fetched from DataFrame, it becomes Series type.

# DataFrame
   foo  bar
a    0    1
b    2    3
c    4    5

# Series
a    0
b    2
c    4

DataFrame operations

index and column

In DataFrame, element numbers such as numpy such as the nth row and mth column and user-defined element specifications by index and column can be specified as element position information. Unless otherwise specified, a number is assigned, but it is not used in practice because it is the same as numpy in such usage. Personally, I also wonder if index can be a number.

Specifying index and columns

To specify index and columns, do as follows.

df.columns = ['foo', 'bar']
df.index = ['a', 'b', 'c']

Also, to check the index and columns name of DataFrame, do as follows.

df.columns
df.index
df.info() # columns, index, memory usage

Simplest column extraction

In DataFrame, the specification of how to take __getitem__ is the specification of columns. You can also retrieve by column number, but in that case, you need to specify even a single list type. However, the line number (index) of index cannot be specified by this method. In the case of Series, index is specified by __getitem__. It's natural because there is only one column.

df['foo'] or df[[0]]  # designate single column
df[['foo', 'bar']] or df[[0, 1]] # designate multi columns

More detailed extraction (ix, iloc, loc)

As mentioned above, there are matrix element numbers and user-defined names as element position information on the DataFrame. There are three types, ix, iloc, and loc, to clarify which one is used for extraction. iloc can be specified only by number, loc can be specified only by name, and ix can be specified by both. Taking the above example, if you want to take [0,0], you can write as follows.

df.ix[[0], [0]]
df.ix[[0], ['foo']]
df.ix[['a'], ['foo']]
df.ix[['a'], [0]]
df.iloc[[0], [0]]
df.loc[['a'], ['foo']]

By the way, if you want to specify multiple indexes, you can do as follows.

df.ix[:, [0]]   #all
df.ix[1:5, [0]] #Range specification
df.ix[:]        #Specify only index

Extraction of rows / columns from specified conditions

Extraction of rows

How to extract rows that meet certain conditions from specified columns. All columns in that column are output.

print foo.loc[foo['bar'] == condition]

Column extraction

Indirectly, the elements that do not meet the conditions are made NaN, and then the columns containing NaN are deleted.

foo = foo[foo == 1] #All elements that do not meet the conditions are NaN.
foo = foo.dropna(axis=1)

Iterator

When iterating for each column of pd.DataFrame.

for index, rows in df.iterrows():
    print index, rows # rows: pd.It is a DataFrame.

Create / edit pd.DataFrame.

#When creating only a vessel
foo = pd.DataFrame(columns=['bar', 'baz'])

foo = pd.DataFrame({'bar': [0, 1, 2],
                    'baz': [3, 4, 5]}
                    index=['a', 'b', 'c'])
# foo
    bar  baz
a    0    3
b    1    4
c    2    5

Add column

Adding a new column is easier than adding a row.

foo['qux'] = [6, 7, 8]
# foo
    bar  baz  qux
a    0    3    6
b    1    4    7
c    2    5    8

Add a row

foo = foo.append(pd.DataFrame({'bar': [6, 7], 'baz': [8, 9]}, index=['d', 'e']))
# foo
#If you want to modify the index, you need to specify it yourself.
    bar  baz
a    0    3
b    1    4
c    2    5
d    6    7
e    8    9

Delete rows and columns.

foo.drop('e')
foo.drop('bar', axis=1) #Delete the column.
del foo['bar'] #Delete the column.(I am using python del.)

Reference URL http://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas

referenced URL: http://sinhrks.hatenablog.com/entry/2014/11/12/233216

Basic operation of pandas