Note) Batch conversion of specific symbols contained in a character string with a dictionary

Overview

"Batch conversion method of characters contained in data" https://qiita.com/wellwell3176/questions/1345ab14964d2a050b5a

I created a program based on the answers to the above questions. I posted it for splitting because it was too long to put on the question page.

specification

For the raw data in Table 1, use the conversion table in Table 2 to separate and convert the symbols and store them in a new column. In this data, the conversion table in Table 2 is small, but in the actual data, there are about 30 types of symbols and about 10 types of column labels, so There was a desire to "combine the conversion table as much as possible with one Excel file".

Table 1 Raw data

No. Horse name
1 Koshihikari ○ ③
2 Sasanishiki ◎
3 Yomogi Dango ✕②
4 Tanaka Katarou ①

Table 2 conversion table

Personal expectations Newspaper forecast
No. 1
No. 2
number 3
favorite
Counter
Large hole

Implemented program

Finished product


import pandas as pd

#Divert the general-purpose program presented in the question and answer
def tagging(df, column, trdict): #Original DataFrame,Columns used for classification,Dictionary for classification
    for key, d in trdict.items():
    #Process for each 1st key (personal forecast, newspaper forecast) of the data frame.
        df_ = df[column].str.extract(f'({"|".join(d.keys())})')
        for k, v in d.items():
            df_ = df_.replace(k, v)
          #2nd key(◎ ✕ etc.)After extracting with and creating a new column, convert the 2nd key and value
            df[column] = df[column].str.replace(k, "")
        df[key] = df_.fillna("")
        #Delete 2nd key from the source column
    return df

df = pd.DataFrame( #Create sample data. Actually read excel
    data=[{
        'No.': 1,
        'horse': 'Koshihikari ○ ③',
    }, {
        'No.': 2,
        'horse': 'Sasanishiki ◎',
    }, {
        'No.': 3,
        'horse': 'Yomogi Dango ✕ ②',
    }, {
        'No.': 4,
        'horse': 'Tanaka Katarou ①',
    }])

dict_raw = pd.read_excel("hogehoge.xlsx",index_col=0)
#hogehoge.xlsx is equal to the excel data in Table 2
dict_process=dict_raw.to_dict(orient='dict')
#to_Convert to dictionary format with dict
list_key=list(dict_process.keys())
#Since it is used in the for statement, the contents of the key are listed (the dictionary has no order, so it cannot be selected as the target of the for statement).

dict_comp=dict() #update()Is used, so the finished product is generated first as an empty dictionary.

#Delete the key that has no value from the dictionary (in this case,"Personal expectations":"◎"Nan is stored in and disturbs)
for i in list_key:
  output_dict = dict(filter(lambda item: item[1] is not np.nan, d3[i].items()))
  dict_comp.update({i:output_dict})

tagging(df,"horse",dict_comp)
<Output result>
	No.horse personal forecast newspaper forecast
0 1 Koshihikari No. 3 Opposition
1 2 Sasanishiki favorite
2 3 Yomogi Dango No. 2 Large Hole
3 4 Tanaka Katarou No. 1

Reference link

Reference source for dictionary filtering: https://tombomemo.com/python-dict-filter/

Recommended Posts

Note) Batch conversion of specific symbols contained in a character string with a dictionary
Sort dict in dict (dictionary in dictionary) with a specific key
Output a character string with line breaks in PyYAML
Get the value of a specific key in a list from the dictionary type in the list with Python
[Python] Leave only the elements that start with a specific character string in the array
When a character string of a certain series is in the Key of the dictionary, the character string is converted to the Value of the dictionary.
[Golang] Check if a specific character string is included in the character string
Conversion of string <-> date (date, datetime) in Python
String conversion of a list containing numbers
<Python> A quiz to batch convert file names separated by a specific character string as part of the file name
Clone with a specific branch / tag in GitPython
Extract lines containing a specific "string" in Pandas
How to convert / restore a string with [] in python
Get the variable name of the variable as a character string.
Store Japanese (multibyte character string) in sqlite3 of python
[Python] How to expand variables in a character string
# Function that returns the character code of a string
Create a batch of images and inflate with ImageDataGenerator
I want to split a character string with hiragana
Calculate the product of matrices with a character expression?
Stop an instance with a specific tag in Boto3
Make a note of what you want to do in the future with Raspberry Pi
Get the value of a specific key up to the specified index in the dictionary list in Python
[Python] Programming to find the number of a in a character string that repeats a specified number of times.
How to quickly count the frequency of appearance of characters from a character string in Python?
[Note] A shell script that checks the CPU usage of a specific process in a while loop.
A note on the default behavior of collate_fn in PyTorch
Get the number of specific elements in a python list
Detect objects of a specific color and size with Python
[Note] Import of a file in the parent directory in Python
Summary of character string format in Python3 Whether to live with the old model or the new model
The story of creating a bot that displays active members in a specific channel of slack with python