This article is an article that I actually coded the basic operation of Pandas described in Kame (@usdatascientist)'s blog (https://datawokagaku.com/python_for_ds_summary/) using Jupyter Lab.
Summary of basic operations of Pandas
10th
import pandas as pd
import numpy as np
Series
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
name John
sex male
age 22
dtype: object
array = np.array([10,20,30])
pd.Series(array)
0 10
1 20
2 30
dtype: int64
array = np.array([10,20,30])
labels = ['a','b','c']
pd.Series(array, labels)
a 10
b 20
c 30
dtype: int64
11th
How to make a DataFrame
Make from ndarray
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
print(john_s['age'])
name John
sex male
age 22
dtype: object
22
ndarray = np.random.randint(5, size=(5,4))
pd.DataFrame(data=ndarray)
|
0 |
1 |
2 |
3 |
0 |
1 |
1 |
1 |
0 |
1 |
4 |
1 |
0 |
0 |
2 |
3 |
2 |
1 |
0 |
3 |
3 |
1 |
1 |
3 |
4 |
4 |
0 |
1 |
3 |
columns = ['a','b','c','d']
index = np.arange(0,50,10)
pd.DataFrame(data=ndarray, index=index, columns=columns)
|
a |
b |
c |
d |
0 |
1 |
1 |
1 |
0 |
10 |
4 |
1 |
0 |
0 |
20 |
3 |
2 |
1 |
0 |
30 |
3 |
1 |
1 |
3 |
40 |
4 |
0 |
1 |
3 |
Make from dictionary
data1 = {
'name':'John',
'sex':'male',
'age':22
}
data2 = {
'name':'Zack',
'sex':'male',
'age':30
}
data3 ={
'name':'Emily',
'sex':'female',
'age':32
}
pd.DataFrame([data1, data2, data3])
|
name |
sex |
age |
0 |
John |
male |
22 |
1 |
Zack |
male |
30 |
2 |
Emily |
female |
32 |
df = pd.read_csv('train.csv')
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
12th
Display the first 5 lines with .head ()
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Check statistics with .describe ()
df.describe()
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
891.000000 |
891.000000 |
891.000000 |
714.000000 |
891.000000 |
891.000000 |
891.000000 |
mean |
446.000000 |
0.383838 |
2.308642 |
29.699118 |
0.523008 |
0.381594 |
32.204208 |
std |
257.353842 |
0.486592 |
0.836071 |
14.526497 |
1.102743 |
0.806057 |
49.693429 |
min |
1.000000 |
0.000000 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
223.500000 |
0.000000 |
2.000000 |
20.125000 |
0.000000 |
0.000000 |
7.910400 |
50% |
446.000000 |
0.000000 |
3.000000 |
28.000000 |
0.000000 |
0.000000 |
14.454200 |
75% |
668.500000 |
1.000000 |
3.000000 |
38.000000 |
1.000000 |
0.000000 |
31.000000 |
max |
891.000000 |
1.000000 |
3.000000 |
80.000000 |
8.000000 |
6.000000 |
512.329200 |
type(df.describe()) #type is DataFrame
pandas.core.frame.DataFrame
Show list of columns in .columns
df.columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
type(df.columns) #type is index
pandas.core.indexes.base.Index
df.index #There is also an index.
RangeIndex(start=0, stop=891, step=1)
Get the Series with a specific column embraced with the bracket [].
df['Age'].head()
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
Name: Age, dtype: float64
type(df['Age'])
pandas.core.series.Series
Put a list of columns in the bracket [] and extract multiple columns at once
df[['Age','Parch','Fare']].head()
|
Age |
Parch |
Fare |
0 |
22.0 |
0 |
7.2500 |
1 |
38.0 |
0 |
71.2833 |
2 |
26.0 |
0 |
7.9250 |
3 |
35.0 |
0 |
53.1000 |
4 |
35.0 |
0 |
8.0500 |
Get a specific row in Series with .iloc [int]
df.iloc[888] #index location
PassengerId 889
Survived 0
Pclass 3
Name Johnston, Miss. Catherine Helen "Carrie"
Sex female
Age NaN
SibSp 1
Parch 2
Ticket W./C. 6607
Fare 23.45
Cabin NaN
Embarked S
Name: 888, dtype: object
df.iloc[888]['Age']
nan
np.isnan(df.iloc[888]['Age'])
True
np.random.seed(1)
ndarray = np.random.randint(10, size=(5,5))
columns = [0,1,2,3,4]
index = ['a','b','c','d','e']
df_1 = pd.DataFrame(data=ndarray, index=index, columns=columns)
df_1
|
0 |
1 |
2 |
3 |
4 |
a |
5 |
8 |
9 |
5 |
0 |
b |
0 |
1 |
7 |
6 |
9 |
c |
2 |
4 |
5 |
2 |
4 |
d |
2 |
4 |
7 |
7 |
9 |
e |
1 |
7 |
0 |
6 |
9 |
df_1[0]
a 5
b 0
c 2
d 2
e 1
Name: 0, dtype: int64
df_1.loc['c'] #When the line is not an int['str']To.
0 2
1 4
2 5
3 2
4 4
Name: c, dtype: int64
Drop certain rows and columns with Slicing
Drop index = 0 (0th column)
df.drop(0) .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
Drop the'Age'column
df.drop('Age', axis=1) .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
When dropping multiple columns, pass a list as an argument .drop ([]). Drop does not change the original df
df.drop(['Age','PassengerId'], axis=1) .head()
|
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
0 |
3 |
Allen, Mr. William Henry |
male |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
df.head()#Drop does not change the original df
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
There are two ways to overwrite df. Setting place = True will change the original DataFrame
df = pd.read_csv('train.csv')
df.drop(['Age', 'Cabin'], axis=1, inplace=True)
df .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
df = pd.read_csv('train.csv')
df = df.drop(['Age', 'Cabin'], axis=1)
id(df)
140285150057616
Get multiple lines with slicing
df.iloc[5:10]
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Embarked |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
0 |
0 |
330877 |
8.4583 |
Q |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
0 |
0 |
17463 |
51.8625 |
S |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
3 |
1 |
349909 |
21.0750 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
0 |
2 |
347742 |
11.1333 |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
1 |
0 |
237736 |
30.0708 |
C |
13th
Filter the DataFrame by specific conditions
df = pd.read_csv('train.csv')
df = df['Survived'] == 1#Filter survivors
df.head()
0 False
1 True
2 True
3 True
4 False
Name: Survived, dtype: bool
filter = df['Survived'] ==1 #Put it in a variable called filter
df = df[filter]
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
df = df[df['Survived'] ==1] #This is more common
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
df[df['Survived'] ==1].describe() #Describe only survivor data
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
342.000000 |
342.0 |
342.000000 |
290.000000 |
342.000000 |
342.000000 |
342.000000 |
mean |
444.368421 |
1.0 |
1.950292 |
28.343690 |
0.473684 |
0.464912 |
48.395408 |
std |
252.358840 |
0.0 |
0.863321 |
14.950952 |
0.708688 |
0.771712 |
66.596998 |
min |
2.000000 |
1.0 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
250.750000 |
1.0 |
1.000000 |
19.000000 |
0.000000 |
0.000000 |
12.475000 |
50% |
439.500000 |
1.0 |
2.000000 |
28.000000 |
0.000000 |
0.000000 |
26.000000 |
75% |
651.500000 |
1.0 |
3.000000 |
36.000000 |
1.000000 |
1.000000 |
57.000000 |
max |
890.000000 |
1.0 |
3.000000 |
80.000000 |
4.000000 |
5.000000 |
512.329200 |
df.describe() #raw data
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
891.000000 |
891.000000 |
891.000000 |
714.000000 |
891.000000 |
891.000000 |
891.000000 |
mean |
446.000000 |
0.383838 |
2.308642 |
29.699118 |
0.523008 |
0.381594 |
32.204208 |
std |
257.353842 |
0.486592 |
0.836071 |
14.526497 |
1.102743 |
0.806057 |
49.693429 |
min |
1.000000 |
0.000000 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
223.500000 |
0.000000 |
2.000000 |
20.125000 |
0.000000 |
0.000000 |
7.910400 |
50% |
446.000000 |
0.000000 |
3.000000 |
28.000000 |
0.000000 |
0.000000 |
14.454200 |
75% |
668.500000 |
1.000000 |
3.000000 |
38.000000 |
1.000000 |
0.000000 |
31.000000 |
max |
891.000000 |
1.000000 |
3.000000 |
80.000000 |
8.000000 |
6.000000 |
512.329200 |
df[df['Age'] >= 60].describe() #'Age'>=60 only
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
mean |
455.807692 |
0.269231 |
1.538462 |
65.096154 |
0.230769 |
0.307692 |
43.467950 |
std |
240.078490 |
0.452344 |
0.811456 |
5.110811 |
0.429669 |
0.837579 |
51.269998 |
min |
34.000000 |
0.000000 |
1.000000 |
60.000000 |
0.000000 |
0.000000 |
6.237500 |
25% |
277.250000 |
0.000000 |
1.000000 |
61.250000 |
0.000000 |
0.000000 |
10.500000 |
50% |
489.000000 |
0.000000 |
1.000000 |
63.500000 |
0.000000 |
0.000000 |
28.275000 |
75% |
629.750000 |
0.750000 |
2.000000 |
69.000000 |
0.000000 |
0.000000 |
58.860450 |
max |
852.000000 |
1.000000 |
3.000000 |
80.000000 |
1.000000 |
4.000000 |
263.000000 |
df[(df['Age']>=60) & (df['Sex']=='female')] #Data for women over 60 years old only
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
275 |
276 |
1 |
1 |
Andrews, Miss. Kornelia Theodosia |
female |
63.0 |
1 |
0 |
13502 |
77.9583 |
D7 |
S |
366 |
367 |
1 |
1 |
Warren, Mrs. Frank Manley (Anna Sophia Atkinson) |
female |
60.0 |
1 |
0 |
110813 |
75.2500 |
D37 |
C |
483 |
484 |
1 |
3 |
Turkula, Mrs. (Hedwig) |
female |
63.0 |
0 |
0 |
4134 |
9.5875 |
NaN |
S |
829 |
830 |
1 |
1 |
Stone, Mrs. George Nelson (Martha Evelyn) |
female |
62.0 |
0 |
0 |
113572 |
80.0000 |
B28 |
NaN |
df[(df['Pclass']==1) | (df['Age']<10)] #Data for 1st class or under 10 years old only
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
If ~ (squiggle) is added, it can be filtered by NOT operation.
data =[{'Name':'John', 'Survived':True},
{'Name':'Emily', 'Survived':False},
{'Name':'Ben', 'Survived':True}]
df = pd.DataFrame(data)
df
|
Name |
Survived |
0 |
John |
True |
1 |
Emily |
False |
2 |
Ben |
True |
It is often used when filtering by a column whose value is boolean.
df[df['Survived']==True]
|
Name |
Survived |
0 |
John |
True |
2 |
Ben |
True |
Since the Survived column is already Boolean, you don't need == True. Since df ['Survived'] is already a Boolean Series, you can filter it as it is as shown on the left.
df[df['Survived']]
|
Name |
Survived |
0 |
John |
True |
2 |
Ben |
True |
If you want to narrow down to Survived == False, you can do the following without having to do df [df ['Survived'== False]
df[~df['Survived']]
|
Name |
Survived |
1 |
Emily |
False |
Change index
Reallocate index with .reset_index ()
df = pd.read_csv('train.csv')
df = df[df['Sex']=='male']
df.head() #index is disjointed
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |
Align indexes
As with .drop (), the original df is not overwritten, so if you want to update df, reassign it with inplace = True or df = df.reset_index ().
df.reset_index() .head()
|
index |
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
2 |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
3 |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
4 |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |
Use .set_index () to index a specific column
Set index to ‘Name’
As with .reset_index (), you can overwrite the original df with inplace = True.
df.set_index('Name').head()
|
PassengerId |
Survived |
Pclass |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
Name |
|
|
|
|
|
|
|
|
|
|
|
Braund, Mr. Owen Harris |
1 |
0 |
3 |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
Allen, Mr. William Henry |
5 |
0 |
3 |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Moran, Mr. James |
6 |
0 |
3 |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
McCarthy, Mr. Timothy J |
7 |
0 |
1 |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
Palsson, Master. Gosta Leonard |
8 |
0 |
3 |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |