A regular expression that finds N or more consecutive substrings of the same character.
import re
def nchars(s, n):
"""Find a substring of the same character n or more consecutive in the string s
"""
assert n > 0
reg = re.compile("(.)\\1{%d,}" % (n - 1)) #If you take the comma, you get exactly n
while True:
m = reg.search(s)
if not m:
break
yield m.group(0)
s = s[m.end():]
print(list(nchars('a good uuu ee', 2)))
print(list(nchars('aa Ii Uuu Uu e ooo', 3)))
This is the execution result.
['Good', 'uuu', 'ee']
['Good', 'Uuuuuuu', 'ooo']
Recommended Posts