I'd like to introduce a Python module called gazpacho
that I recently learned.
gazpacho is a "simple, fast, modern library for web scraping".
gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. (https://pypi.org/project/gazpacho/)
The number of stars is still minor at 400, so I think it's best to keep it for personal use.
--You can get and analyze HTML with this one library.
--When using BeautifulSoup
etc., you had to get the HTML first with requests etc.
—— Fewer methods to remember
--Analyze with one find
command
--No modules depend on
First, install the module.
pip install gazpacho
I will scrape and output the title of the book from the following site featured in the tutorial.
https://scrape.world/books
from gazpacho import get, Soup
#Get HTML based on the specified URL
html = get('https://scrape.world/books')
#Create an instance for analysis
soup = Soup(html)
#Get the elements you need. List if more than one is found[Soup]Returns (Soup for singular)
#The first argument is an HTML tag
#The second argument is the specification of id and class
#Whether the third specification allows partial match
#In the example, the class is"book-"Because it is"book-early"Etc. match
books = soup.find('div', {'class': 'book-'}, partial=True)
for book in books:
name_header = book.find('h4')
#The text field contains the contents of the tag
name = name_header.text
print(name)
Personally, I use it properly as shown below.
The module itself of gazpacho is simple, so I'm thinking of finding time to read it.
I hope more people will read and use this article!
Recommended Posts