Features of regular expression modules that I often use personally in Python

I often forget, but the Official Document has too much text and it's hard to check it every time, so it's a memorandum. ,. We plan to add more as needed.

import re

By the way, note that from sympy import * will result in a re function that returns the real part.

Match object

To take out the matched part ...

filename = 'Back_in_the_U.S.S.R.m4a'
m = re.match(r'([\w\.-]+?)\.(\w+)$', filename)
print(m.group(0)) #The whole match(m.group()Equivalent to?)
print(m.group(1)) #1st group
print(m.group(2)) #Second group
print(m.groups()) #Tuple the entire group


('Back_in_the_U.S.S.R', 'm4a')

Name the group with (? P <groupname>) and access from the keyword.

filename = 'Back_in_the_U.S.S.R.m4a'
m = re.match(r'(?P<basename>[\w\.]+?)\.(?P<ext>\w+)$', filename)
print(m.group('basename')) #(?P<basename> )String that matches
print(m.group('ext')) #(?P<ext> )String that matches
print(m.groupdict()) #Named groups in a dictionary


{'basename': 'Back_in_the_U.S.S.R', 'ext': 'm4a'}

Use the matched string for replacement

For example, to replace \ ruby {reductio} {reductio} in $ \ LaTeX $ with <ruby> reductio <rt> reductio </ rt> </ ruby> ...

             r'\ruby{Reductio ad absurdum}{View Plaza}'))


<ruby>Reductio ad absurdum<rt>View Plaza</rt></ruby>

Use a group named with (? P <groupname>) Use\ g <groupname>to refer to it.

             r'\ruby{Reductio ad absurdum}{View Plaza}'))


<ruby>Reductio ad absurdum<rt>View Plaza</rt></ruby>

It's confusing with the html tag, but I got the same result.

List all matched parts

To retrieve all the contents of an HTML ʻem element or strong` element ...

src = r'<em>Axiom of choice</em>Assuming, to any set<strong>You can put the order</strong>.'
re.findall(r'<(em|strong)>(.*?)</\1>', src)


[('em', 'Axiom of choice'), ('strong', 'You can put the order')]

Replace by passing the matched part to the function

To somehow change cm to m in the text ...

def cm2m(m): #Prepare a function that takes a match object as an argument
    value = m.group(1)
    return str(float(value)/100) + 'm'
print(re.sub(r'(\d+)cm', cm2m, '271cm +314 cm is 585 cm.'))


2.71m + 3.14m is 5.It is 85m.

I wonder if lambda is good for simple processing that does not require the purpose of defining a function.

print(re.sub(r'(\d+)cm', lambda m: str(float(m.group(1))/100) + 'm', '271cm +314 cm is 585 cm.'))


2.71m + 3.14m is 5.It is 85m.

Nested parentheses

Use recursion. However, since it is a function that the standard re does not have, use regex. If it is not installed, use pip install regex etc. The following matches \ frac {} {} in $ \ LaTeX $ (maybe).

import regex
pattern_frac = r'\\frac(?<rec>\{(?:[^{}]+|(?&rec))*\}){2}'
m = regex.search(pattern_frac, r'1 + \frac{\int_{a}^{b} f(x)\,dx }{\sum_{k=1}^{n}a_{k}}')


\frac{\int_{a}^{b} f(x)\,dx }{\sum_{k=1}^{n}a_{k}}

