How to unify data with katakana, symbols, alphabets and numbers mixed with full-width and half-width characters.
reference:
https://qiita.com/shakechi/items/d12641d6cad01479785f
Since it is troublesome, when CSV is opened with pandas, I made it a function so that full-width and half-width processing can be performed for each column. Just put the column name in the list of columns = [] and it's OK.
What to process: Make katakana, symbols (spaces, etc.) and numbers half-width.
#Pre-install pip install jaconv with a terminal or command line tool.
import jaconv
def shori(column):
list= df[column].values.tolist()
new_list = []
for li in list:
li = jaconv.z2h(li,digit=True, ascii=True,kana=True)
new_list.append(li)
df[column] = new_list
return df[column]
##Put the column name you want to process in the list.
columns = []
#Turn with for.
for column in columns:
shori(column)
Recommended Posts