I decided to scrape for some reason in my work, so I hurriedly tried using it.
sc.py
import urllib.request
import bs4
url = 'http://www.XXXXXX.jp'
html = urllib.request.urlopen(url)
soup = bs4.BeautifulSoup(html, 'html.parser')
title = soup.select('.lxl-inCateList ul li a dl dt')
price = soup.find_all("dd", class_="l-price")
for i in title:
a = (i.string)
print (a)
for i in price:
b = (i.string)
print (b)
It's a source that doesn't look beautiful,
a = (i.string)
By doing so, unnecessary HTML tags could be deleted.
soup.find_all("dd", class_="l-price")
It's really convenient to be able to go to see classes and so on. I wish I knew earlier ... With a sudden need, the task of "collecting this and this from the site into a document" becomes easier at once.
Recommended Posts