This article describes efficiency improvement using the exclusion processing loop in IQ Bot.
Exclusion is a kind of substitution, so the idea is basically the same, but I think that substitution is probably a little more difficult in terms of coding, so I cut it out here.
In the case of processing replacements in a loop, the most common encounter is "half-width kana alignment". ABBYY, one of the OCR engines built into IQ Bot, tends to be a little weak in reading half-width kana.
If you just want to read half-width characters in full-width characters (or vice versa), you can correct them on the RPA side with the Japanese half-width ⇔ full-width conversion action in the Bot Store, but "ga" is changed to "ka". There are cases where it is read as "(ka + double quotation)", and I think that it is faster to do this with Python than to do it properly with RPA.
(Although you can write Python with RPA)
The difference between exclusion and replacement is as follows in terms of field item logic.
Difference between exclusion and replacement
#This is excluded
field_value = field_value.replace("String to exclude","")
#This is the replacement
field_value = field_value.replace("Character string before replacement","Character string after replacement")
The basic grammar is "replacement", and exclusion is realized by changing the "character string after replacement" in replacement to " "
(= empty string).
In the case of exclusion, the character string after replacement was always fixed at " "
, so the only multiple elements were " character string to be excluded "
(= character string before replacement).
So, the sequence to be looped was OK with the structure of the primary array.
On the other hand, in the case of replacement, there are multiple before and after replacement, so it is necessary to put before and after replacement in the sequence as a set. The structure of the secondary array.
This is the difficulty level UP point.
But don't worry. Yes ... No, I will explain it properly.
The replacement process is completed with the following structure.
Concept of replacement processing
replace_list = (("String before replacement 1","Character string after replacement 1"),
("Character string before replacement 2","Character string after replacement 2"),
("String before replacement 3","Character string after replacement 3"))
for i in replace_list:
field_value = field_value.replace(i[0],i[1])
replace_list
has a tuple inside the tuple, that is, it has a structure of a quadratic array.
In ʻi in the for loop,
("character string before replacement 1", "character string after replacement 1") for the first processing,
("" for the second processing The tuple of the child is entered as the character string before replacement 2 "," character string after replacement 2 ")`.
It is a mechanism that the elements of index 0 (ʻi [0] / before replacement) and 1 (ʻi [1]
/ after replacement) are taken out from the ʻi and passed to the argument of
replace`. ..
By the way, the correction logic of half-width kana given in the example at the beginning is such an image.
Replacement process for half-width kana
replace_list = (('Mosquito"','Moth'),('Ki"','Gi'),('Ku"','Gu'),('Ke"','Ge'),('Ko"','Go'),
('Service"','The'),('Shi"','The'),('Su"','Zu'),('Se"','Ze'),('So"','Zo'),
('Ta"','Da'),('Ji"','Di'),('Tsu"','Zu'),('Te"','De'),('To"','Do'),
('C"','Ba'),('Hi"','Bi'),('Fu"','Bu'),('F"','Be'),('E"','Bo'),
('C °','Pacific League'),('Hi °','Pi'),('F °','Pu'),('F °','Pe'),('Ho °','Po'))
for i in replace_list:
field_value = field_value.replace(i[0],i[1])
Assuming that the voiced sound mark is read as "
(double quotation mark) and the semi-voiced sound mark is read as°
(degree), it is a process of aligning to full-width kana.
In reality, the reading of voiced sound marks and semi-voiced sound marks often fluctuates a little more.
It is efficient to apply this process after unifying the reading results of the voiced sound mark and semi-voiced sound mark by the normal replacement process while looking at the OCR result.
In the case of a table, you can change the contents of the for loop for the table, so it should work as follows.
Replacement process (for table)
replace_list = (("String before replacement 1","Character string after replacement 1"),
("Character string before replacement 2","Character string after replacement 2"),
("String before replacement 3","Character string after replacement 3"))
for i in replace_list:
df['Column name'] = df['Column name'].str.replace(i[0],i[1])
But the above isn't very beautiful ... Am I the only one who feels?
I would do this.
Replacement process (for table)
replace_list = (("String before replacement 1","Character string after replacement 1"),
("Character string before replacement 2","Character string after replacement 2"),
("String before replacement 3","Character string after replacement 3"))
def table_replace(x,y):
for i in y:
x = x.y(i[0],i[1])
return x
df['Column name'] = df['Column name'].apply(table_replace,y=replace_list)
If you do this
--When you want to apply the same combination of character strings to be replaced, but to different columns --When you want to change the combination of character strings you want to replace depending on the column
It is possible to respond flexibly to such cases.
Replacement process (for table)
replace_listA = (("String before replacement 1","Character string after replacement 1"),
("Character string before replacement 2","Character string after replacement 2"),
("String before replacement 3","Character string after replacement 3"))
replace_listB = (("Character string before replacement a","Character string after replacement a"),
("Character string before replacement b","Character string after replacement b"),
("Character string before replacement c","Character string after replacement c"))
def table_replace(x,y):
for i in y:
x = x.y(i[0],i[1])
return x
df['Column name 1'] = df['Column name 1'].apply(table_replace,y=replace_listA)
df['Column name 1'] = df['Column name 1'].apply(table_replace,y=replace_listB)
df['Column name 2'] = df['Column name 2'].apply(table_replace,y=replace_listA)
df['Column name 3'] = df['Column name 3'].apply(table_replace,y=replace_listB)
It is the point.
Apply both replace_listA
and replace_listB
replacements to column name 1
Only replace by replace_listA
is applied to column name 2 and
This is an example in which only the replacement by replace_listB
is applied to column name 3.
Hmmm, wasn't it difficult?
If you have any questions, please leave a comment on this article or contact us via DM on Twitter.
Recommended Posts