As a prerequisite, the library re
is being imported.
import re
All of the following notations extract only the text part, and the conditions at the time of extraction are shown in groups (). Affirmation is extracted when the text conditions in the group also match. Negation is extracted when the text conditions in the group do not match.
name | Description method | Overview |
---|---|---|
Positive look-ahead | text(?=xxx) | Get the text part when xxx matches |
Negative look-ahead | text(?!xxx) | Get text part when xxx does not match |
Affirmative look-behind | (?<=xxx)text | Get the text part when xxx matches |
Negative look-ahead | (?<!xxx)text | Get text part when xxx does not match |
Extract the text part when the text you want to extract matches by inserting (? = xxx)
after the text you want to extract and the xxx
part also matches.
re.findall('AB(?=CDEF)', 'ABCDEF') #['AB']
re.findall('AB(?=DEF)', 'ABCDEF') #[]
re.findall('AB(?=CD)', 'ABCDEF') #['AB']
re.findall('.+(?=CD)', 'ABCDEF') #['AB']
re.findall('AB(?=[A-Z]{2,3})', 'ABCDEF') #['AB']
In negative look-ahead, write (?! Xxx)
after the text you want to extract.
Extract the text part when the text part matches and the xxx
part does not match
re.findall('AB(?!CDEF)', 'ABCDEF') #[]
re.findall('AB(?!DEF)', 'ABCDEF') #['AB']
re.findall('AB(?!CD)', 'ABCDEF') #[]
Since it is denied, the result is the opposite of lookahead.
By writing the text you want to extract after (? <= xxx)
, the text part is extracted when the xxx
part matches and the text also matches the part.
re.findall('(?<=ABCD)EF', 'ABCDEF') #['EF']
re.findall('(?<=BC)EF', 'ABCDEF') #[]
re.findall('(?<=)EF', 'ABCDEF') #['EF']
By writing the text you want to extract after (? <! Xxx)
, the text part is extracted when the xxx
part does not exist and the text matches the part.
re.findall('(?<!ABCD)EF', 'ABCDEF') #[]
re.findall('(?<!BC)EF', 'ABCDEF') #['EF']
re.findall('(?<!)EF', 'ABCDEF') #[]
text = 'Python python PYTHON'
re.findall('py(?=thon)', text) #['py']
re.findall('Py|py(?=thon)', 'Python python PYTHON') #['Py', 'py']
re.findall('py(?=thon)', 'Python python PYTHON', re.IGNORECASE) #['Py', 'py', 'PY']
text = 'Street address:Shinjuku-ku, Tokyo 〇〇〇〇 Tower\r Click here for map'
re.findall('(?<=:).*(?=\r)', text)[0] #Shinjuku-ku, Tokyo 〇〇〇〇 Tower
[Reference site] Master the look-ahead and look-behind of regular expressions!
Recommended Posts