Summary of regular expressions often used when scraping with Selenium Basic
Scheduled to be added at any time
Example: Enter "Shinjuku" on the tabelog search screen to get the number of items that appear. Search by tabelog Search results: 1 to 20 are displayed / 4664 in total I want to get this "4664".
vba
Dim mozi as String
mozi = "Search results: 1 to 20 are displayed / 4664 in total" 'Target sentence
mozi = WorksheetFunction.Clean(mozi) 'Delete line feed code, etc.
mozi = Replace(mozi, " ", "") 'Delete half-width space
mozi = Replace(mozi, " ", "") 'Delete double-byte space
Debug.Print (mozi) 'Search results: 1 to 20 items are displayed / 4664 items in total
'↑ Useless white space has been removed.
vba
Dim re As RegExp
Set re = New RegExp
Dim pattern As String: pattern = "all(\d+)Case" 'Regular expression pattern
Dim mc As MatchCollection
Dim m As Match
Dim Matches As MatchCollection
'Regular expression specification
With re
.pattern = pattern
.IgnoreCase = False 'Is it case sensitive?(False), Do not(True)
.Global = True 'Do you want to search the entire string(True), Do not(False)
End With
Set Matches = re.Execute(mozi) 'Execute regular expression matching to the character string prepared in ↑
If Matches.Count > 0 Then
Debug.Print (Matches.Item(0)) 'All 4669 cases
'↑ At this point, there are "4669 cases in total". So
'From this, the seating expression pattern()I want to take out only the numbers that are enclosed
set m = Matches.Item(0)
Debug.Print (m.SubMatches(0))’4669 I got it
End If
environment Windows Python3.8.3
python
import re
mozi="Display 1 to 20 cases / 4664 cases in total"
pattern = "all(\d+)Case"
mozi=mozi.replace(" ","") #Delete half-width space
mozi=mozi.replace(" ","") #Delete double-byte space
ptn=re.compile(pattern) #Prepare regular expression pattern Pattern object is returned
if result := ptn.search(mozi): #Search execution: None if there is no Match object if it matches
print(result.group(0)) #All 4664 cases Matched character strings
print(result.group(1)) #4664 ()Enclosed part
Extraction of all XX cases is often used. Other regular expressions that are often used will be added as needed. I would like to point out if there is an easier way.
Recommended Posts