Wow Pandas Let's learn a lot

Introduction

Pandas is a library that provides functions to support data analysis in the programming language Python [^ wiki]. I think Pandas is complicated even in the Python library [^ atm]. However, the degree of freedom is so high that it is unthinkable for data analysts to analyze data without Pandas. So, I would like to explain to the point that "If you understand this far, you can do anything (if you look at other sites)" [^ title].

[^ wiki]: See https://ja.wikipedia.org/wiki/Pandas

[^ atm]: But there is an atmosphere that cannot be said to be difficult.

[^ title]: Pandas is not a language, but the title fits nicely.

How to capture

\ 1. Preparation

Enable the use of numpy (one-dimensional) index references, slicing, Boolean index references, and fancy index references
Enable the use of numpy (two-dimensional) index references, slicing, and Boolean index references. Understand the behavior of the numpy.iloc_ function. (Fancy index reference is a specification that is difficult to use in ndarray in 2D, so I rarely use it personally)

\ 2. Introduction to Pandas

Create Series, DataFrame and use index reference (Series is basically an extension of numpy (1D). DataFrame is basically df.loc (label name priority) ) Or dx.iloc (number priority) is basically an extension of numpy (2D).)
Allows you to add, extract, delete, modify, etc. data for Series, DataFrame
(If the element or index name in Series, DataFrame is a character string, you can perform batch operations of extraction and modification. It is convenient, so understand the character string processing by the str accessor of Pandas)

After doing so, you should get on track and reach a level where you can investigate various things yourself (you should be able to understand group by and so on smoothly).

What can be said throughout + especially what is the dimension after calculation? I think it's important to always be aware of that when writing code.

For example, with Numpy

arr = np.arange(12) #arr is a one-dimensional ndarray
arr = arr.reshape(3,4) #arr is a two-dimensional ndarrary
# arr[i,j]The first element of is a row, the second element is a column
arr[:2] #2D ndarray
arr[:2, 0] #1D ndarray
arr[:, arr[0] > 2] #2D ndarray

With Pandas

pop = {'Nevada' : {2001 : 2.4, 2002 : 2.9},
       'Ohio' : {2000 : 1.5, 2001 : 1.7}}
df = DataFrame(pop) # DataFrame(2D)
df[df['Nevada'] > 2] # DataFrame(2D)
df.iloc[-1:]['Nevada'] # Series(1D)

What is the type like that? If you are aware of that and understand it, it seems that half is over.

So, let's summarize the behavior of the index reference of ndarray (2D) and then proceed to Pandas ~

Preparation

import

import numpy as np # ndarray
#Needed to display matplot in jupyter
%matplotlib inline
import matplotlib.pyplot as plt
from pandas import Series, DataFrame
import pandas as pd

Numpy

Let's take a two-dimensional ndarray. To understand Pandas There are two things to understand here:

Understand Numpy index references, slicing, Boolean index references, and fancy index references.
Two-dimensional Numpy is ʻarr [], ʻarr [<row specification>, <column specification>] (I understand it without a rattle).

Example

arr = np.arange(12).reshape(3,4) #arr is a two-dimensional ndarrary(3 rows 4 columns)
#array([[ 0,  1,  2,  3],
#       [ 4,  5,  6,  7],
#       [ 8,  9, 10, 11]])

When indexer has only one argument, it looks like this:

#Get a one-dimensional ndarray
arr[1] #Element reference by scalar value
arr[0:2] #Slicing Extract the 0th and 1st lines(The second line is not extracted)
##For each element in the first line(>2)Returns the boolean value of
arr[1] > 2 # array([ True,  True,  True,  True], dtype=bool) 

#Get a two-dimensional ndarray
arr>2 #Boolean index reference
arr[np.array([True, False, True])] #Extract lines 0 and 2 with Boolean index reference
# arr[[True, False, True]] # Warning
arr[[0,2,1]] #See fancy index:For index reference(integer)Use an array Extract the 0th, 2nd, and 1st lines in order

Set the indexer of Numpy (in 2D, ʻarr [･ (first argument), ･ (second argument)]`). The first argument is the row and the second argument is the column.

It's basically the same as a one-dimensional ndarray, but note only the pitfalls that are easy to fall into:

#When you want to specify only the second argument. The first argument cannot be omitted. At that time slicing`:`As the first argument
arr[:, 1]

#If you specify a fancy index for the first and second arguments, the operation will be a little unintuitive.
## (Also note that it will be a one-dimensional ndarray!
## np.array([arr[i,j] for i,j in zip([1,2], [0,1])])Equivalent to.# array([4, 9])
arr[[1,2], [0,1]]
## 1,2nd line and 0,To get a 2D ndarray that extracts the area of the first column, do the following:
arr[np.iloc_([1,2], [0,1])]
array([[4, 5],
       [8, 9]])

The chances of writing this in Numpy itself are extremely low, but since the idea becomes important when using Pandas, I will describe it below (skip it for the time being):

#line
arr[:,1] > 2 # array([False,  True,  True], dtype=bool)

#The first row is(>2)Extract lines that look like
arr[arr[:,1] > 2] # arr[np.array([False,  True,  True])]Same as.(I haven't used it much personally)
# arr[arr[:, 1] > 2, :]Same as.


arr[1] > 5
arr[:, arr[1] > 5] # array([False, False,  True,  True], dtype=bool)
#arr[:, np.array([False, False, True, True])] #Same as

In summary, the behavior of the index reference type in ndarray (2D) looks like this [^ summary]:

[^ summary]: It's a little forcible. The "None" column of the second argument points to ʻarr [･] . The parentheses in (1d) mean that you don't use them too much. 1d stands for 1D ndarray and 2d stands for 2Dndarray`.

First argument\Second argument	None	scalar	Slicing	Boolean index	Fancy index
None	-	❌	❌	❌	❌
scalar	1d	0d	1d	1d	1d
Slicing	2d	2d	2d	2d	2d
Boolean index	2d	1d	2d	(2d)	(1d)
Fancy index	2d	1d	2d	(1d)	(1d)

Besides, I want to keep the difference in behavior from the normal list (slightly different)

#A trap that can be mistaken for something else. I want to triple the element of arr
> arr = [0,1,2,3]
> arr*4
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]

> np.arange(4)*4
[0,4,8,12]
#If you want to do the same without converting to numpy, use comprehensions.
> [i*4 for i in range(4)]
[0,4,8,12]

Introduction to Pandas

In Numpy, it should have been treated as the same ndarray regardless of whether it is 1D or 2D, but in Pandas, it is divided as 1D => Series, 2D => DataFrame. I am. So, although the names are different, the DataFrame and Series cannot be separated because they go back and forth between 2D <=> 1D.

For example, you can extract a one-dimensional Series by specifying a single row / column from DataFrame. Conversely, you can create a DataFrame by specifying the Series (1D) listʻordict as an argument to the DataFrame` (2D) constructor.

So, understanding whether a variable is one-dimensional or two-dimensional is important even if the name changes to Series, DataFrame.

About Series

Creating a Series

Basically, I often put dict, list in the constructor. In the case of dict, it will be Series with index.

#Example of thrusting a dict
dic = {'word' : 470, 'camera' : 78}
Series(dic)
#Often, a zip and dict combination technique is used to generate a Series:
Series(dict(zip(words, frequency)))

Index reference

For index references, it is an extension of the one-dimensional ndarray. The difference is that the index name can also be included as an index argument.

ser = Series(np.random.randn(5), index = list('ABCDE'))
#A    1.700973
#B    1.061330
#C    0.695804
#D   -0.435989
#E   -0.332942
#dtype: float64

#Slicing
ser[1] #The first line, that is'A'Extract row 0 dimension(type =float64 type)　
ser['A'] # 'A'Extract rows(type = float)
ser[1:3] #1,Extract the second line(Series(One dimensional)
ser[-1:] #Extract the last line
ser[:-1] #Extract all rows except the last row
ser[[1,2]] # 1,Extract the second line(Fancy index)
ser[['A', 'B']] # (Fancy)You can also give the index as a string
ser > 0 #The type of ser is Series(1D)Each element is a boolean value
ser[ser > 0] #Boolean index(ser > 0)Element reference with

# Read,Since both can be written, it is also possible to write the rvalue only to the corresponding one, as shown below..
#The technique of bringing a condition to an lvalue is often used in DataFrame.
ser[ser > 0] = 0

About DataFrame

Creating a DataFrame

Two-dimensional things are pushed into the arguments of the constructor, but if the outside is list or dict, it doesn't matter what the inside is (list, Series, dict, tuple Yes)

#When both outside and inside are dict
pop = {'Nevada' : {2001 : 2.4, 2002 : 2.9},
       'Ohio' : {2000 : 1.5, 2001 : 1.7}}
df2 = DataFrame(pop)
#      Nevada  Ohio
#2000     NaN   1.5
#2001     2.4   1.7
#2002     2.9   NaN

#The outside is dict,When the inside is series
# df1,df2 is a DataFrame type(So df1['name'], df2['address']Is a Series type)
##column name is['typeA', 'typeB'],index name is[0,1,2,3]
dfA = DataFrame({'typeA' : df1['name'], 'typeB' : df2['address']})
##index name is[0,1,2,3],column name is['name', 'address'](attribute T is transposed)
dfB = DataFrame([df1['name'], df2['address']]).T

We often use the + builtin zip function to create a DataFrame:

dict(zip([1,2,3], [4,5,6,7])) #{1: 4, 2: 5, 3: 6} =>Cannot be converted to DataFrame
list(zip([1,2,3], [4,5,6,7])) #[(1, 4), (2, 5), (3, 6)] =>Can be converted to DataFrame(outside:List, inside:Because it's a tuple)
pd.DataFrame(list(zip([1,2,3], [4,5,6,7]))) # => OK!

You can also create a DataFrame by inserting a 2D ndarray.

df = DataFrame(np.arange(12).reshape(3,4), columns = list('ABCD'))
print(df)
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

You can also create a DataFrame by combining it with the Series.

DataFrame(Series({'word' : 470, 'camera' : 78}), columns = ['frequency'])

Creating a DataFrame from aSeries will be discussed in detail in the Data Addition section of the beginner's edition.

DataFrame index reference

In Pandas,df [･]ordf.loc [<row specification>] ʻor,df.loc [, ] ʻordf.iloc [<row specification> ]ordf.iloc [<row specification>, <column specification>]can be created. df [･] behaves quite confusingly as follows.

#I often use
#dfA[1] # runtime error!!The first column cannot be retrieved as an integer value
#dfA['typeA'] #'typeA'Series columns(1D)Extracted as
dfA[['typeB', 'typeA']] # typeB,DataFrame with type A columns in order(2D)Extracted as
dfA['typeA'] > 3 #1D Series(Each element is a boolean value)


#A little confusing(I often use it personally)
dfA[dfA['typeA'] > 3] #dfA'typeA'Extract rows with 3 or more columns.
# dfA.loc[dfA['typeA'] > 3] #If you are worried, use this


#Below, it's quite complicated, so I don't use it much.
dfA[1:] #The first line~DataFrame(2D)Extracted as(Note that it is a row extraction)
#dfA[1:]I would write this rather than myself.
dfA.loc[1:] #Clarified that it is a line specification. Or dfA.loc[1:, :]

df.loc is a version where you can specify the label name of Numpy. So basically, you should write it with the same glue as the index reference of Numpy.

Notes on `df.loc`

However, there are two things to keep in mind when dealing with df.loc (quite important and easy to get stuck in).

One is that df.loc has priority over the label name, so even when an integer value is specified for ʻindex, the index number is not referenced, but the line corresponding to the label name is extracted. is. For example, when you want to sort` and extract the first row, it is quite easy to get an accident:

dic = list(zip([0,3,5,6], list('ADCB')))
dfA = DataFrame(dic, columns = ['typeA', 'typeB'])
#   typeA typeB
#0      0     A
#1      3     D
#2      5     C
#3      6     B
dfA = dfA.sort_values(by = 'typeB')
#   typeA typeB
#0      0     A
#3      6     B
#2      5     C
#1      3     D
dfA.loc[1] #1st(In other words, the second place)I want to extract rows, but when I use loc, the rows with index label name 1 are extracted:
#typeA    3
#typeB    D
#Name: 1, dtype: object


##To prevent such a tragedy, df.Use iloc. The line number has priority.
##(#3      6     B)Can be extracted
dfA.iloc[1]

ʻIloc is often used after extraction. (If ʻindex is not in numerical order, it cannot be referenced byloc [number].)

df = df[df['A'] == name]
df.iloc[0]['B'] #It feels a little uncluttered...

The other is a trap that is easy to fall into when dealing with integer index, but if you want to extract the last row, referencing a negative value in df.loc will fail. Since the label name has priority, it is said that there is no -1 label. Again, use df.iloc to emphasize that the extraction is for line numbers.

# dfA.loc[-1] : NG
dfA.iloc[-1] # OK(The last line is Series(1D)Extracted as)
dfA.iloc[-1:] # OK(The last line is DataFrame(2D)Extracted as)

On the contrary, df.iloc can only use numbers, so if you want to specify rows by row numbers and columns by label names, write as follows.

df.iloc[i]['A'] #It is good to write like this
#iloc can only specify columns with numbers
# Location based indexing can only have
# [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
# res *= df.iloc[i, 'A'] #error

To summarize the index reference of DataFrame,

If you are uneasy, use df.loc [<row specification>] ʻor, df.loc [, ] rather than df [･] `.
However, if you want to emphasize the extraction of row numbers, use df.iloc [<row specification>] or df.iloc [<row specification>, <column specification>].

If you keep only these two points in mind, you can extract data as enjoyably as the index reference of Numpy. It's really easy if you only need to remember one type of df.loc, but integer indexes are so popular in practice that you can't avoid using df.iloc: sweat. ::

Supplement

loc, ʻiloc` cannot be copied to the lvalue when indexer is used twice as shown below. (The value you want to modify in the original DataFrame is not modified).

#A value is trying to be set on a copy of a slice from a DataFrame
df.loc[5]['colA'] #Cannot be an lvalue

#no problem!(Because it is a reference)
df.loc[k, 'non_view_rate'] *= mult

Pandas Beginner Edition

So far we've looked at Pandas index references. Maybe it's over the mountain, but there are still some confusing parts such as additions and corrections to DataFrame. For the basic usage of each function, [Introduction to data analysis by Python --- Data processing using NumPy, pandas](https://www.amazon.co.jp/ Introduction to data analysis by Python --- NumPy, pandas Data processing using-Wes-McKinney / dp / 4873116554), and here I would like to summarize it in a reverse way.

How to shape it using Pandas

Add (consolidate)

ser = Series([1,2,3], index = list('ABC'))
#A    1
#B    2
#C    3
#dtype: int64

Will be expressed as Series (3 * 1). The index names are all the same. (['A','B','C']) Let's see how to concatenate various patterns of data.

(Series(31) <- Series(31)) -> DataFrame

I want to make it DataFrame (2 * 3)

DataFrame([s1, s2]) #Using the constructor

I want to make it DataFrame (3 * 2)

df = DataFrame([s1, s2], index = list('AB')).T
pd.concat([s1, s2], axis = 1) #If you want to stack downwards, concat(.., axis = 1)Should be used

I want to make it DataFrame (6 * 1)

serA.append(serB)
#Or
pd.concat([serA, serB])
#If you want the index name to be a serial number of 0..
s1.append(s2).reset_index(drop = True) #Re-sort the index

I want to make it DataFrame (1 * 6)

df1 = DataFrame(serA)
df2 = DataFrame(serB)
ndf = df1.join(df2, how = 'outer', lsuffix = 'A', rsuffix = 'B') #I wonder
#Only two can be connected here.
ndf = pd.merge(df1, df2, left_index=True, right_index=True, how='outer') 
   0A  1A  2A  0B  1B  2B
0   1   2   3   4   5   6

(DataFrame(n3) <- Series(31)) => DataFrame

I want to make it DataFrame ((n + 1) * 3)

# Can only append a Series if ignore_index=True or if the Series has a index name
df.append(serA, ignore_index = True)

cols = ['colA', 'colB', 'colC']
res_df = DataFrame(columns = cols)
res_df = res_df.append(Series([1,2,3], cols).T, ignore_index = True)
...

I want to stack multiple Series

I want to do something like this

(Series(3*1) + Series(3*1)) + Series(3*1) + Series(3*1)-> DataFrame(4*3)

df = DataFrame([serA, serB, serC, serD])
# DataFrame(3*4)If you want to.Just add T
df = DataFrame([serA, serB, serC, serD]).T

I want to add the last line (Series) downward to an existing DataFrame (or add a DataFrame downward)

See Add rows and columns to dataframes at http://pythondatascience.plavox.info/pandas/pandas/.

#Add one line
df.loc['newrow'] = 0
df.append(serA, ignore_index = True)
#Add multiple lines
df1.append(df2)
#Add multiple DataFrames(Poke the list)
df1.append([df2, df3, df4])
#Or
pd.concat([df1, df2, df3, df4])

#1 column added
df['newcol'] = 0

The indexes are almost the same, and I want to connect df to the right

#Not applicable index is outer join(outer)To.(NAN値To.)
df1.join(df2, how = 'outer')
df1.join([df2, df3], how = 'outer')
#merge can be more detailed, but limited to merging two DataFrames:
df1.merge(df1, how = 'outer')

Other additions

For database-like merge df1.merge (df2), please refer to [1].

For other details, see the official page http://pandas.pydata.org/pandas-docs/version/0.19.1/generated/pandas.DataFrame.merge.html or http://sinhrks.hatenablog.com/entry / 2015/01/28/073 I think 327 is good.

For the latter,

+Simple vertical concatenation DataFrame.append
+Flexible concatenation pd.concat
+Join by column value pd.merge
+Join by index DataFrame.join (make easy version of merge)

It is easy to understand because it is written with figures and examples.

Modify the contents of the DataFrame

I want to rename all index and columns

#Rename index
df.index = ['one', 'two', 'three']
#index number reassignment
df.reset_index(drop = True) #Re-sort the index(From 0~)

#Rename columns
##table creation=>After editing, the column may not be in the expected order, so
##It is safer to explicitly specify the column order.
df = df[['old_a', 'old_b', 'old_c']] 
df.columns = ['new_a', 'new_b', 'new_c']

#Or df.use rename
df = df[['old_a', 'old_b', 'old_c']] #Either way, if you don't care about the order of the columns.
#Since rename is not a destructive method, it must be assigned to an lvalue. Specify columns for parameter.(Note that it is not an axis parameter)
df = df.rename(columns = {'old_a' : 'new_a', 'old_b' : 'new_b', 'old_c' : 'new_c'})

Note1) You can also use df.rename when you want to change some index and column names. (Specify the dict type (as a correspondence table before-change) in the ʻindex or columns parameter. Note that there is no ʻaxis parameter. The rest isinstead ofcolumn. columns (with s))

Note2) reindex is a replacement of the existing index position, not an index name change. set_index creates a new object using one or more specific columns as an index, such as df.set_index (['c1','c0']). Note that this is not a method for renaming index. reset_index converts a hierarchical index to a column. Just the relationship of set_index <=> reset_index.

Substitution

Change another column in one column that meets the required criteria

# 'A'Focus on the columns'wrong'Select the rows that are and of those rows'B'Column'sth'Change to
df.loc[df['A'] == 'wrong', 'B'] = 'sth'

sort_index (for sorting around index), sort_values (specified by by) can be sorted in ascending or descending order (in the case of descending order, specify ʻascending = False`) )

Extraction

If you want to put multiple Boolean indexes in df [・].

http://naotoogawa.hatenablog.jp/entry/2015/09/12/PandasのDataFrameの嵌りどころ

#Enclose each Boolean index in parentheses
df = df[(df['A'] > 0) | (df['B'] > 0)]

When you want to check whether each element of df [･] is in multiple candidates

#apply is a function that takes a Series as an argument(Lambda expression)Into the first argument
#map is a function that takes an element as an argument(Lambda expression)Into the first argument
df = df[df['A'].map(lambda d : d in listA)]

Delete

Delete rows and columns with df.drop (non-destractive). If you specify axis, you can delete both rows and columns.

df = df.drop("A", axis=1)
#column is'A', 'B', .. 'F'so'C'From the column'F'列まso削除したいときとかは、以下のようにする方が多い
df = df[['A', 'B']]

get rid of na

See http://nekoyukimmm.hatenablog.com/entry/2015/02/25/222414.

If you want to delete the entire row or column containing NAN, you can specify how ='all'.

Boolean index reference

How to get a Boolean type

#Returns DataFrame type
df.apply(lambda ser: ser % 2 == 0)
df.applymap(lambda x: x % 2 == 0)
df['goal'] == 0
df.isin([1,2])
df = df[~df.index.duplicated()] #Remove duplicate index(Delete the data that appears after the second time)
#Returns Series type
df.apply(lambda ser : (ser > 0).any())
df['A'].map(lambda x : x > -1)
serA > serB #series type
-bool_ser #Flip the index of an element of a bool index

#The second argument is only the element that becomes False in the first argument
df['A'].where(df['A'] > 0, -df['A']) #Series version of abs(If it does not apply to the first argument, add a negative sign(In other words, it becomes positive because it is negative.)
(df['goal'] == 0).all() #True if you are addicted to all the conditions
df.apply(lambda ser: ser % 2 == 0)
(df['cdf(%)'] < 90).sum() #Count the number that meets the conditions
df.where(df % 3 == 0, -df)

Writing style that NA value may be assigned

It's quite common to get stuck around NA, so make a note of where the NA value may be generated.

Constructor index, columns

dic = dict(zip(list('ABCD'), [3,4,6,2])) #Generate dict
ser = Series(dic, index = list('ABCDE'))
#Column E not in dic is NAN
#A    3.0
#B    4.0
#C    6.0
#D    2.0
#E    NaN
#dtype: float64

The data parameter of the DataFrame constructor is not in the dictionary

(Example)

pop = {'Nevada' : {2001 : 2.4, 2002 : 2.9},
       'Ohio' : {2000 : 1.5, 2001 : 1.7}}
df = DataFrame(pop)

      Nevada  Ohio
2000     NaN   1.5
2001     2.4   1.7
2002     2.9   NaN

When df.reindex contains something that is not included in the index of df in the index parameter.

Example omitted

Index reference using loc field

df.loc[[2002, 2001, 1999], ['Alaska', 'Nevada']]

      Alaska  Nevada
2002     NaN     2.9
2001     NaN     2.4
1999     NaN     NaN

Note) df ['non_exists'], df.loc [:,'non_exists'] (specify a name that is not in the column) and an error.

Addition between data frames (NAN is the unsupported element of ʻindex or column`)
When how ='outer' is specified for merge, join
ʻAppend, concat` (if index does not support)

How to get rid of NA value

df.dropna (parameter is how or axis),df.fillna (0)(set NA value to 0 uniformly
Specify fill_value parameter or method parameter in df.reindex
Only the NA value of combine_first: ʻold_df` is completed by the first argument. (Note that the part where the element is 0 in old_df is not subject to patching!)

# df ..index jumps(index.name : B_idx, columns = ['A'] =>index serial number(0~89)I want to set the interpolation value to 0
old_df = DataFrame(index = range(90), columns = 'A')
new_df = old_df.combine_first(df).fillna(0) # index.name disappears

Manipulating strings

Especially because the character string operation of Series is often used soberly. It can be used not only for elements but also for index names and column names!

See http://pandas.pydata.org/pandas-docs/stable/text.html (especially at the bottom) for more information. If you want to operate a character string in DataFrame, you can usually solve it by looking here.

You can use it, for example, when you want to extract only the lines that match a certain regular expression.

#df'A'Update df by extracting only the rows that start with lowercase letters in the column
r = '^[a-z]'
df = df[df['A'].str.match(r)] # df['A'].str.match(r)Is a Boolean index

References

[1] [Introduction to data analysis using Python --- Data processing using NumPy and pandas](https://www.amazon.co.jp/ Introduction to data analysis using Python --- Data processing using NumPy and pandas-Wes -McKinney / dp / 4873116554)

[2] http://sinhrks.hatenablog.com/entry/2015/01/28/073327

[3] Documentation http://pandas.pydata.org/pandas-docs/stable/api.html

Wow Pandas Let's learn a lot

Wow Pandas Let's learn a lot

Introduction

How to capture

Preparation

Example

Introduction to Pandas

About Series

Creating a Series

Index reference

About DataFrame

Creating a DataFrame

DataFrame index reference

Notes on df.loc

Supplement

Pandas Beginner Edition

How to shape it using Pandas

Add (consolidate)

I want to stack multiple Series

I want to add the last line (Series) downward to an existing DataFrame (or add a DataFrame downward)

The indexes are almost the same, and I want to connect df to the right

Other additions

Modify the contents of the DataFrame

I want to rename all index and columns

Substitution

Extraction

Delete

get rid of na

Boolean index reference

Writing style that NA value may be assigned

How to get rid of NA value

Manipulating strings

References

Notes on `df.loc`