This and that about pd.DataFrame

Introduction

A memo about pandas.DataFrame (pd.DataFrame).

DataFrame initialization

#Empty DataFrame
df = pd.DataFrame(columns=[List of column names])

#Get from csv file
df = pd.read_csv([File path])
df = pd.read_csv([File path], names=[List of column names])  #Without header
df = pd.read_csv([File path], sep=',')  #When specifying the delimiter
df = pd.read_csv([File path], delim_whitespace=True)  #When separated by spaces
df = pd.read_csv([File path], comment='#')  #When including comment text

Reference: Read csv / tsv file with pandas (read_csv, read_table)

Add dictionary to DataFrame

df = df.append([dictionary], ignore_index=True)

Note that unlike the append in the list, the `df.append ()` alone does not update the DataFrame.

Extract elements from DataFrame

#     'a' 'b'
# 0 |  1   2
# 1 |  3   4

#Get element
df.loc[0,'a'] # -> 1

#Get row
dist(df.loc[0,:]) # -> {'a':1, 'b':2}

#Get column
list(df.loc[:,'a']) # -> [1, 3]

Reference: Get / change the value of any position with pandas at, iat, loc, iloc

Extract rows that meet the conditions from DataFrame

#Simple conditions
df = df[df['num']>0]
df = df[df['str']=='Yes']
df = df[df['str'].isin(['Yes', 'No'])]  #When there are multiple candidates

#String conditions (if it contains missing values NaN)'na=False'To the options)
df = df[df['str'].str.startswith('Y')]  #First string
df = df[df['str'].str.contains('e')]  #Character string contained in
df = df[df['str'].str.endswith('s')]  #String at the end

#Multiple conditions
df = df[(df['num']>0) & (df['str']=='Yes')]  #Instead of and&
df = df[(df['num']>0) | (df['str']=='Yes')]  #Instead of or|

Reference: query to extract rows of pandas.DataFrame by condition

Other

#Sort according to the specified column
df = df.sort_values('a', ascending=True)

#Reindex
df = df.reset_index(drop=True)

#Save DataFrame to csv file
df.to_csv([File path], index=False)