Right-click on the page source code to see the page source instead
Use the one displayed by the developer tool
<dt>price<span class="tax">(tax included)</span></dt>
To extract the text of the span
tag embedded in the dt
tag like
source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
soup.text
And .text
can be extracted by specifying
<dt>
price
<span class="tax">(tax included)</span>
</dt>
When there is a white space in the tag such as
def remove_whitespace(str):
return ''.join(str.split())
source = '<dt>price<span class="tax">(tax included)</span></dt>'
soup = BeautifulSoup(source, "html.parser")
remove_whitespace(soup.text)
Can be taken out
Since the space in the center cannot be deleted with strip ()
, the space character is used as the delimiter with split ()
.
Join with .join
soup.find(class_='hoge')
soup.find_all(class_='hoge')
soup.find(id='hoge')
soup.find_all(id='hoge')
soup.find('hoge')
soup.find_all('hoge')
They can also have multiple conditions at the same time
soup.find('hoge',class_='fuga)
Recommended Posts