Regular expression look-ahead, after-yomi

As a prerequisite, the library re is being imported.

import re

All of the following notations extract only the text part, and the conditions at the time of extraction are shown in groups (). Affirmation is extracted when the text conditions in the group also match. Negation is extracted when the text conditions in the group do not match.

name Description method Overview
Positive look-ahead text(?=xxx) Get the text part when xxx matches
Negative look-ahead text(?!xxx) Get text part when xxx does not match
Affirmative look-behind (?<=xxx)text Get the text part when xxx matches
Negative look-ahead (?<!xxx)text Get text part when xxx does not match

Positive look-ahead

Extract the text part when the text you want to extract matches by inserting (? = xxx) after the text you want to extract and the xxx part also matches.

re.findall('AB(?=CDEF)', 'ABCDEF') #['AB']
re.findall('AB(?=DEF)', 'ABCDEF') #[]
re.findall('AB(?=CD)', 'ABCDEF') #['AB']
re.findall('.+(?=CD)', 'ABCDEF') #['AB']
re.findall('AB(?=[A-Z]{2,3})', 'ABCDEF') #['AB']

Negative look-ahead

In negative look-ahead, write (?! Xxx) after the text you want to extract. Extract the text part when the text part matches and the xxx part does not match

re.findall('AB(?!CDEF)', 'ABCDEF') #[]
re.findall('AB(?!DEF)', 'ABCDEF') #['AB']
re.findall('AB(?!CD)', 'ABCDEF') #[]

Since it is denied, the result is the opposite of lookahead.

Affirmative look-behind

By writing the text you want to extract after (? <= xxx), the text part is extracted when the xxx part matches and the text also matches the part.

re.findall('(?<=ABCD)EF', 'ABCDEF') #['EF']
re.findall('(?<=BC)EF', 'ABCDEF') #[]
re.findall('(?<=)EF', 'ABCDEF') #['EF']

Negative look-ahead

By writing the text you want to extract after (? <! Xxx), the text part is extracted when the xxx part does not exist and the text matches the part.

re.findall('(?<!ABCD)EF', 'ABCDEF') #[]
re.findall('(?<!BC)EF', 'ABCDEF') #['EF']
re.findall('(?<!)EF', 'ABCDEF') #[]

Application of affirmative look-ahead

text = 'Python python PYTHON'

re.findall('py(?=thon)', text) #['py']
re.findall('Py|py(?=thon)', 'Python python PYTHON') #['Py', 'py']
re.findall('py(?=thon)', 'Python python PYTHON', re.IGNORECASE) #['Py', 'py', 'PY']

Usage that is likely to be scraping

text = 'Street address:Shinjuku-ku, Tokyo 〇〇〇〇 Tower\r Click here for map'
re.findall('(?<=:).*(?=\r)', text)[0] #Shinjuku-ku, Tokyo 〇〇〇〇 Tower

[Reference site] Master the look-ahead and look-behind of regular expressions!

Recommended Posts

Regular expression look-ahead, after-yomi
Regular expression Greedy
Regular expression re
Regular expression in regex.h
Regular expression with pymongo
Date notation regular expression
python regular expression memo
Regular expression matching method
Regular expression in Python
Regular expression in Python
Regular expression confirmation quiz!
Python 處 處 regular expression Notes
Julia Quick Note [04] Regular Expression
Regular expression manipulation with Python
Regular expression check tool summary
Decompose hostname with co.jp with regular expression
String replacement with Python regular expression
100 language processing knocks 2020: Chapter 3 (regular expression)
Introduction to regular expression processing system
(Python) HTML reading and regular expression notes
Search pythondict dictionary key by regular expression