I want to use the string matched by Python regular expression replacement for replacement. I forget it every time, so make a note
Suppose you have a list like this in English analysis.
List with English words
sentence = ['During', 'this', 'time', ',', 'many', 'chatterbots', 'were', 'written', 'including',
'PARRY', ',', 'Racter', ',', 'and', 'Jabberwacky', '.']
When you want to make this a string (of the whole sentence), the first thing you can think of is join.
Try joining with a space delimiter
' '.join(sentence)
#During this time , many chatterbots were written including PARRY , Racter , and Jabberwacky .
Yes ... As you might have guessed, there is a half-width space ** in front of the .
or ,
**.
I was in trouble
Then what to do ... That's right. Replace with a regular expression!
But how do you replace it with ,
or .
?
Let's do it for the time being.
Correct the text in English
import re
bad = ' '.join(sentence)
fixed = re.sub(r' ([,.])', r'\1', bad)
# r' [,.]' :Half-width space+「,」「.Any of
print(fixed)
output
During this time, many chatterbots were written including PARRY, Racter, and Jabberwacky.
Complete!
When using re.sub In the pattern character string of the first argument, enclose the part you want to reuse in () Specify the number () you want to use for the part you want to replace the character string of the second argument, ": \ 1 if it is the first". Based on this, I will write some examples.
check.py
bad = 'including PARRY , Racter , and Jabberwacky .'
#The one from earlier
re.sub(r' ([,.])', r'\1', bad)
Out[4]: 'including PARRY, Racter, and Jabberwacky.'
# 「(The location is different
re.sub(r'( [,.])', r'\1', bad)
Out[5]: 'including PARRY , Racter , and Jabberwacky .' # == bad
# 「()Try to enclose the space with
re.sub(r'( )([,.])', r'\1', bad)
Out[6]: 'including PARRY Racter and Jabberwacky '
#Change the number to use
re.sub(r'( )([,.])', r'\2', bad)
Out[7]: 'including PARRY, Racter, and Jabberwacky.'
#Try to connect and use
re.sub(r'( )([,.])', r'\1\2', bad)
Out[8]: 'including PARRY , Racter , and Jabberwacky .' # == bad
#How about the opposite
re.sub(r'( )([,.])', r'\2\1', bad)
Out[9]: 'including PARRY, Racter, and Jabberwacky. ' # != bad
#Try playing
re.sub(r'( )([,.])', r'\1 Hoge\2', bad)
Out[12]: 'including PARRY Hoge,Racter Hoge,and Jabberwacky Hoge.'
Be sure to prefix '\ num'
with r
. Additional notes
Good example r'\ 1'
Bad example '\ 1'
If you add a backslash \
, it will work without the r
.
*** Thank you to shiracamus for letting us know in the comments. *** ***
In the case of '\ num'
, it is necessary to add r
, but if it is '\\ num'
, it is not necessary to add r
.
r means raw (raw = raw, raw) and is a raw (raw) string that does not treat backslashes as escape characters.
>>> import re
>>> bad = 'including PARRY , Racter , and Jabberwacky .'
>>> re.sub(' ([,.])', '\\1', bad)
'including PARRY, Racter, and Jabberwacky.'
If you have any other questions, please let us know in the comments.
Recommended Posts