Overview

"Batch conversion method of characters contained in data" https://qiita.com/wellwell3176/questions/1345ab14964d2a050b5a

I created a program based on the answers to the above questions. I posted it for splitting because it was too long to put on the question page.

specification

For the raw data in Table 1, use the conversion table in Table 2 to separate and convert the symbols and store them in a new column. In this data, the conversion table in Table 2 is small, but in the actual data, there are about 30 types of symbols and about 10 types of column labels, so There was a desire to "combine the conversion table as much as possible with one Excel file".

If the size is about Table 2, it is absolutely easier to create and manage "Individual forecast conversion table" and "Newspaper forecast conversion table" for each tab of the Excel file.

Table 1 Raw data

No.	Horse name
1	Koshihikari ○ ③
2	Sasanishiki ◎
3	Yomogi Dango ✕②
4	Tanaka Katarou ①

Table 2 conversion table

	Personal expectations	Newspaper forecast
①	No. 1
②	No. 2
③	number 3
◎		favorite
○		Counter
✕		Large hole

Implemented program

`Finished product`


import pandas as pd

#Divert the general-purpose program presented in the question and answer
def tagging(df, column, trdict): #Original DataFrame,Columns used for classification,Dictionary for classification
    for key, d in trdict.items():
    #Process for each 1st key (personal forecast, newspaper forecast) of the data frame.
        df_ = df[column].str.extract(f'({"|".join(d.keys())})')
        for k, v in d.items():
            df_ = df_.replace(k, v)
        　　#2nd key(◎ ✕ etc.)After extracting with and creating a new column, convert the 2nd key and value
            df[column] = df[column].str.replace(k, "")
        df[key] = df_.fillna("")
        #Delete 2nd key from the source column
    return df

df = pd.DataFrame( #Create sample data. Actually read excel
    data=[{
        'No.': 1,
        'horse': 'Koshihikari ○ ③',
    }, {
        'No.': 2,
        'horse': 'Sasanishiki ◎',
    }, {
        'No.': 3,
        'horse': 'Yomogi Dango ✕ ②',
    }, {
        'No.': 4,
        'horse': 'Tanaka Katarou ①',
    }])

dict_raw = pd.read_excel("hogehoge.xlsx",index_col=0)
#hogehoge.xlsx is equal to the excel data in Table 2
dict_process=dict_raw.to_dict(orient='dict')
#to_Convert to dictionary format with dict
list_key=list(dict_process.keys())
#Since it is used in the for statement, the contents of the key are listed (the dictionary has no order, so it cannot be selected as the target of the for statement).

dict_comp=dict() #update()Is used, so the finished product is generated first as an empty dictionary.

#Delete the key that has no value from the dictionary (in this case,"Personal expectations":"◎"Nan is stored in and disturbs)
for i in list_key:
  output_dict = dict(filter(lambda item: item[1] is not np.nan, d3[i].items()))
  dict_comp.update({i:output_dict})

tagging(df,"horse",dict_comp)

<Output result>
	No.horse personal forecast newspaper forecast
0 1 Koshihikari No. 3 Opposition
1 2 Sasanishiki favorite
2 3 Yomogi Dango No. 2 Large Hole
3 4 Tanaka Katarou No. 1

Reference link

Reference source for dictionary filtering: https://tombomemo.com/python-dict-filter/

Note) Batch conversion of specific symbols contained in a character string with a dictionary

Overview

specification

Implemented program

Finished product

Reference link

`Finished product`