A memo about pandas.DataFrame (pd.DataFrame).
#Empty DataFrame
df = pd.DataFrame(columns=[List of column names])
#Get from csv file
df = pd.read_csv([File path])
df = pd.read_csv([File path], names=[List of column names])  #Without header
df = pd.read_csv([File path], sep=',')  #When specifying the delimiter
df = pd.read_csv([File path], delim_whitespace=True)  #When separated by spaces
df = pd.read_csv([File path], comment='#')  #When including comment text
Reference: Read csv / tsv file with pandas (read_csv, read_table)
df = df.append([dictionary], ignore_index=True)
Note that unlike the append in the list, the `df.append ()` alone does not update the DataFrame.
#     'a' 'b'
# 0 |  1   2
# 1 |  3   4
#Get element
df.loc[0,'a'] # -> 1
#Get row
dist(df.loc[0,:]) # -> {'a':1, 'b':2}
#Get column
list(df.loc[:,'a']) # -> [1, 3]
Reference: Get / change the value of any position with pandas at, iat, loc, iloc
#Simple conditions
df = df[df['num']>0]
df = df[df['str']=='Yes']
df = df[df['str'].isin(['Yes', 'No'])]  #When there are multiple candidates
#String conditions (if it contains missing values NaN)'na=False'To the options)
df = df[df['str'].str.startswith('Y')]  #First string
df = df[df['str'].str.contains('e')]  #Character string contained in
df = df[df['str'].str.endswith('s')]  #String at the end
#Multiple conditions
df = df[(df['num']>0) & (df['str']=='Yes')]  #Instead of and&
df = df[(df['num']>0) | (df['str']=='Yes')]  #Instead of or|
Reference: query to extract rows of pandas.DataFrame by condition
#Sort according to the specified column
df = df.sort_values('a', ascending=True)
#Reindex
df = df.reset_index(drop=True)
#Save DataFrame to csv file
df.to_csv([File path], index=False)
Recommended Posts