A memo about pandas.DataFrame (pd.DataFrame).
#Empty DataFrame
df = pd.DataFrame(columns=[List of column names])
#Get from csv file
df = pd.read_csv([File path])
df = pd.read_csv([File path], names=[List of column names]) #Without header
df = pd.read_csv([File path], sep=',') #When specifying the delimiter
df = pd.read_csv([File path], delim_whitespace=True) #When separated by spaces
df = pd.read_csv([File path], comment='#') #When including comment text
Reference: Read csv / tsv file with pandas (read_csv, read_table)
df = df.append([dictionary], ignore_index=True)
Note that unlike the append
in the list, the `df.append ()`
alone does not update the DataFrame.
# 'a' 'b'
# 0 | 1 2
# 1 | 3 4
#Get element
df.loc[0,'a'] # -> 1
#Get row
dist(df.loc[0,:]) # -> {'a':1, 'b':2}
#Get column
list(df.loc[:,'a']) # -> [1, 3]
Reference: Get / change the value of any position with pandas at, iat, loc, iloc
#Simple conditions
df = df[df['num']>0]
df = df[df['str']=='Yes']
df = df[df['str'].isin(['Yes', 'No'])] #When there are multiple candidates
#String conditions (if it contains missing values NaN)'na=False'To the options)
df = df[df['str'].str.startswith('Y')] #First string
df = df[df['str'].str.contains('e')] #Character string contained in
df = df[df['str'].str.endswith('s')] #String at the end
#Multiple conditions
df = df[(df['num']>0) & (df['str']=='Yes')] #Instead of and&
df = df[(df['num']>0) | (df['str']=='Yes')] #Instead of or|
Reference: query to extract rows of pandas.DataFrame by condition
#Sort according to the specified column
df = df.sort_values('a', ascending=True)
#Reindex
df = df.reset_index(drop=True)
#Save DataFrame to csv file
df.to_csv([File path], index=False)
Recommended Posts