If you just want to get the string, just re.findall.
python
target_text = "So far, seven types of "coronavirus" that infect humans have been found, and one of them is the so-called "new coronavirus (SARS)" that has been a problem since December last year.-CoV2) ". Of these, four types of viruses account for 10 to 15% of common colds (35% during the epidemic), and most are mild. The remaining two viruses are "Severe Acute Respiratory Syndrome (SARS)" that occurred in 2002 and "Middle East Respiratory Syndrome (MERS)" that has occurred since 2012. Coronaviruses infect all animals, but rarely infect other animals of different species. In addition, it is known that the infectivity is lost by alcohol disinfection (70%)."
keyword = "([0-9]+)"
results = re.findall(keyword, target_text)
# ['12', '2', '10', '15', '35', '2002', '2012', '70']
However, there is no function that can be used when you want to get all the match objects that can be obtained by re.search (). So, write a function that recursively searches the entire string.
python
import re
def search_all(regrex, target, search_start_index=0, matches=None):
if matches == None:
matches = []
match = re.search(regrex, target[search_start_index:])
if match == None:
return matches
matches.append(match)
return search_all(regrex, target, search_start_index + match.end() + 1, matches)
python
target_text = "So far, seven types of "coronavirus" that infect humans have been found, and one of them is the so-called "new coronavirus (SARS)" that has been a problem since December last year.-CoV2) ". Of these, four types of viruses account for 10 to 15% of common colds (35% during the epidemic), and most are mild. The remaining two viruses are "Severe Acute Respiratory Syndrome (SARS)" that occurred in 2002 and "Middle East Respiratory Syndrome (MERS)" that has occurred since 2012. Coronaviruses infect all animals, but rarely infect other animals of different species. In addition, it is known that the infectivity is lost by alcohol disinfection (70%)."
keyword = "([0-9]+)"
search_result_groups = search_all(keyword, target_text)
for item in search_result_groups:
print(item.group())
# 12
# 2
# 10
# 15
# 35
# 2002
# 2012
# 70
--I added / edited re.findall () in the comments. Thank you very much.
--In the comments, it was pointed out that writing the default value as matches = []
is an anti-pattern, and I corrected it. Thank you very much.
Recommended Posts