"Batch conversion method of characters contained in data" https://qiita.com/wellwell3176/questions/1345ab14964d2a050b5a
I created a program based on the answers to the above questions. I posted it for splitting because it was too long to put on the question page.
For the raw data in Table 1, use the conversion table in Table 2 to separate and convert the symbols and store them in a new column. In this data, the conversion table in Table 2 is small, but in the actual data, there are about 30 types of symbols and about 10 types of column labels, so There was a desire to "combine the conversion table as much as possible with one Excel file".
Table 1 Raw data
No. | Horse name |
---|---|
1 | Koshihikari ○ ③ |
2 | Sasanishiki ◎ |
3 | Yomogi Dango ✕② |
4 | Tanaka Katarou ① |
Table 2 conversion table
Personal expectations | Newspaper forecast | |
---|---|---|
① | No. 1 | |
② | No. 2 | |
③ | number 3 | |
◎ | favorite | |
○ | Counter | |
✕ | Large hole |
Finished product
import pandas as pd
#Divert the general-purpose program presented in the question and answer
def tagging(df, column, trdict): #Original DataFrame,Columns used for classification,Dictionary for classification
for key, d in trdict.items():
#Process for each 1st key (personal forecast, newspaper forecast) of the data frame.
df_ = df[column].str.extract(f'({"|".join(d.keys())})')
for k, v in d.items():
df_ = df_.replace(k, v)
#2nd key(◎ ✕ etc.)After extracting with and creating a new column, convert the 2nd key and value
df[column] = df[column].str.replace(k, "")
df[key] = df_.fillna("")
#Delete 2nd key from the source column
return df
df = pd.DataFrame( #Create sample data. Actually read excel
data=[{
'No.': 1,
'horse': 'Koshihikari ○ ③',
}, {
'No.': 2,
'horse': 'Sasanishiki ◎',
}, {
'No.': 3,
'horse': 'Yomogi Dango ✕ ②',
}, {
'No.': 4,
'horse': 'Tanaka Katarou ①',
}])
dict_raw = pd.read_excel("hogehoge.xlsx",index_col=0)
#hogehoge.xlsx is equal to the excel data in Table 2
dict_process=dict_raw.to_dict(orient='dict')
#to_Convert to dictionary format with dict
list_key=list(dict_process.keys())
#Since it is used in the for statement, the contents of the key are listed (the dictionary has no order, so it cannot be selected as the target of the for statement).
dict_comp=dict() #update()Is used, so the finished product is generated first as an empty dictionary.
#Delete the key that has no value from the dictionary (in this case,"Personal expectations":"◎"Nan is stored in and disturbs)
for i in list_key:
output_dict = dict(filter(lambda item: item[1] is not np.nan, d3[i].items()))
dict_comp.update({i:output_dict})
tagging(df,"horse",dict_comp)
<Output result>
No.horse personal forecast newspaper forecast
0 1 Koshihikari No. 3 Opposition
1 2 Sasanishiki favorite
2 3 Yomogi Dango No. 2 Large Hole
3 4 Tanaka Katarou No. 1
Reference source for dictionary filtering: https://tombomemo.com/python-dict-filter/
Recommended Posts