I often forget, but the Official Document has too much text and it's hard to check it every time, so it's a memorandum. ,. We plan to add more as needed.
import re
By the way, note that from sympy import *
will result in a re
function that returns the real part.
To take out the matched part ...
filename = 'Back_in_the_U.S.S.R.m4a'
m = re.match(r'([\w\.-]+?)\.(\w+)$', filename)
print(m.group(0)) #The whole match(m.group()Equivalent to?)
print(m.group(1)) #1st group
print(m.group(2)) #Second group
print(m.groups()) #Tuple the entire group
out
Back_in_the_U.S.S.R.m4a
Back_in_the_U.S.S.R
m4a
('Back_in_the_U.S.S.R', 'm4a')
Name the group with (? P <groupname>)
and access from the keyword.
filename = 'Back_in_the_U.S.S.R.m4a'
m = re.match(r'(?P<basename>[\w\.]+?)\.(?P<ext>\w+)$', filename)
print(m.group('basename')) #(?P<basename> )String that matches
print(m.group('ext')) #(?P<ext> )String that matches
print(m.groupdict()) #Named groups in a dictionary
out
Back_in_the_U.S.S.R
m4a
{'basename': 'Back_in_the_U.S.S.R', 'ext': 'm4a'}
For example, to replace \ ruby {reductio} {reductio}
in $ \ LaTeX $ with <ruby> reductio <rt> reductio </ rt> </ ruby>
...
print(re.sub(r'\\ruby\{(\w+)\}\{(\w+)\}',
r'<ruby>\1<rt>\2</rt></ruby>',
r'\ruby{Reductio ad absurdum}{View Plaza}'))
out
<ruby>Reductio ad absurdum<rt>View Plaza</rt></ruby>
Use a group named with (? P <groupname>)
Use\ g <groupname>
to refer to it.
print(re.sub(r'\\ruby\{(?P<rb>\w+)\}\{(?P<rt>\w+)\}',
r'<ruby>\g<rb><rt>\g<rt></rt></ruby>',
r'\ruby{Reductio ad absurdum}{View Plaza}'))
out
<ruby>Reductio ad absurdum<rt>View Plaza</rt></ruby>
It's confusing with the html tag, but I got the same result.
To retrieve all the contents of an HTML ʻem element or
strong` element ...
src = r'<em>Axiom of choice</em>Assuming, to any set<strong>You can put the order</strong>.'
re.findall(r'<(em|strong)>(.*?)</\1>', src)
out
[('em', 'Axiom of choice'), ('strong', 'You can put the order')]
To somehow change cm to m in the text ...
def cm2m(m): #Prepare a function that takes a match object as an argument
value = m.group(1)
return str(float(value)/100) + 'm'
print(re.sub(r'(\d+)cm', cm2m, '271cm +314 cm is 585 cm.'))
out
2.71m + 3.14m is 5.It is 85m.
I wonder if lambda is good for simple processing that does not require the purpose of defining a function.
print(re.sub(r'(\d+)cm', lambda m: str(float(m.group(1))/100) + 'm', '271cm +314 cm is 585 cm.'))
out
2.71m + 3.14m is 5.It is 85m.
Use recursion. However, since it is a function that the standard re
does not have, use regex
. If it is not installed, use pip install regex
etc. The following matches \ frac {} {}
in $ \ LaTeX $ (maybe).
import regex
pattern_frac = r'\\frac(?<rec>\{(?:[^{}]+|(?&rec))*\}){2}'
m = regex.search(pattern_frac, r'1 + \frac{\int_{a}^{b} f(x)\,dx }{\sum_{k=1}^{n}a_{k}}')
print(m.group())
out
\frac{\int_{a}^{b} f(x)\,dx }{\sum_{k=1}^{n}a_{k}}
Recommended Posts