[Python] A memorandum of beautiful soup4

Introduction

A memorandum of html tag search by beautifulsoup4.

environment

python: 3.7
beautifulsoup4: 4.8.2

Basic search

#All p tags
find_all("p")

#Only the first p tag found
find("p")

#a tag and href starting with hogehoge
import re
find_all("a", href=re.compile("^hogehoge"))

Search using css selector

#Specify parent-child relationship, loosely
select('body div p')

#Parent-child relationship # 2, strict
select('body > div > p')

#name of the class
select('.myclass')

#id name
select('#myid')

#AND condition
select('.myclass1.myclass2')

nth tag

#The third of the html below<li>Search for tags
# <html>
# <body>
#   <ul>
#     <li>Not specified</li>
#     <li>Not specified</li>
#     <li>It is specified</li>
#     <li>Not specified</li>
#   </ul>
# </body>
# </html>

select('body > ul > li:nth-of-type(3)')

What to do when nth-of-type () does not work

The reason why it didn't work was that the html of the scraping source site had a start tag but no close tag. The solution is to remove the start tag. (By the way, the closing tag existed on Chrome's developer tools, so I didn't notice it until I looked at the source of the page ...)

url = "http://hogehoge/"
soup = BeautifulSoup(url.text, "lxml")

#Remove the dd tag because there is no closing tag for the dd tag
for tag in soup.find_all('dd'):
  tag.unwrap()

Remove all <dd> tags. However, if you use .decompose (), the elements after <dd> will also disappear, so delete only the tag with .unwrap ().

References

https://www.sukerou.com/2019/01/python3-beautifulsoup4web.html

Recommended Posts

[Python] A memorandum of beautiful soup4

A memorandum when using beautiful soup

[Python3] Understand the basics of Beautiful Soup

[Python] Scraping a table using Beautiful Soup

A memorandum of python string deletion process

My Beautiful Soup (Python)

A memorandum of calling Python from Common Lisp

A memorandum of extraction by python bs4 request

A memorandum of kernel compilation

A small memorandum of openpyxl

A memorandum about correlation [Python]

A memorandum about Python mock

A memorandum of using eigen3

[Python] Delete by specifying a tag with Beautiful Soup

A record of patching a python package

Try scraping with Python + Beautiful Soup

A memorandum of stumbling on my personal HEROKU & Python (Flask)

Beautiful Soup

Python Memorandum 2

A brief summary of Python collections

Scraping with Python and Beautiful Soup

A memorandum of files under conf.d

Python memorandum

python memorandum

python memorandum

Python memorandum

python memorandum

Memorandum of beginners Python "isdigit" movement

Python memorandum

A memorandum of closure survey contents

A memorandum of understanding for the Python package management tool ez_setup

A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)

A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)

A memorandum regarding the acquisition of the Python3 engineer certification basic exam

Display a list of alphabets in Python 3

A memorandum of using Python's input function

A memorandum of speed of arbitrary degree diagonalization

Make a relation diagram of Python module

Memorandum of python beginners About inclusion notation

Connect a lot of Python or and and

A memorandum of understanding about django's QueryDict

[python] Get a list of instance variables

[python] [meta] Is the type of python a type?

The story of blackjack A processing (python)

[Python] Get a list of folders only

A memorandum of trouble when formatting data

Introduction of Python

Python basics memorandum

Python pathlib memorandum

Memorandum of sed

Python memorandum (algorithm)

Beautiful Soup memo

Basics of Python ①

Basics of python ①

Write a basic headless web scraping "bot" in Python with Beautiful Soup 4

Copy of python

Python memorandum [links]

Introduction of Python

A memo of a tutorial on running python on heroku

[AtCoder] Solve A problem of ABC101 ~ 169 with Python

Draw a graph of a quadratic function in Python