This time we will use `pandas`
and `` `re``` (modules for using regular expressions)
import pandas as pd
import re
df = pd.read_csv("filename.csv")
Delete unnecessary elements for the entire column
df['Column name'] = df['Column name'].str.replace(r'(\d)', '') #Delete numbers
df['Column name'] = df['Column name'].str.replace('-', '') #Remove sign
df['Column name'] = df['Column name'].str.replace('word', '') #Delete word
df['Column name'] = df['Column name'].str.strip() #Remove whitespace at the beginning and end
df['Column name'] = df['Column name'].str.replace(r'(\d)', '').str.replace('-', '').str.replace('Ah', '').str.strip()
#These can also be run at the same time
name
Suppose that each element consisting of multiple words exists in the column
Example:
df['name'][0] = "I have a pen."
df['name'][1] = "She has a pen."
On the other hand, the first word is extracted and stored as a list in a new column called `` `subject```. Example:
df['subject'][0] = "I"
df['subject'][1] = "She"
temp = df['name'].str.split() #Break down into words
subject = [] #Create an empty list to store the clipped words
for item in temp:
subject.append(item[0]) #Store the first word of each line in the list
df['subject'] = subject #Added to the original dataframe with the column name subject
.at[]You can access specific data by using
df.at['Line name','Column name'] = "This is a test" df.at[line number,'Column name'] = "This is a test"
## 5. csv output
Finally, output the edited data frame to csv. By adding ```encoding ='utf_8_sig'```, garbled characters can be prevented.
df.to_csv("filename_v2.csv", encoding='utf_8_sig')
Recommended Posts