document http://www.crummy.com/software/BeautifulSoup/bs4/doc/
BeautifulSoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(raw) #raw is web page load data
#findAll:Get the object of the corresponding tag in the list
#Below is the class image-Get all ul of items
ul_items = soup.findAll('ul',class_='image-items')
#find:Get 1 object of the corresponding tag
a = item.find('a')
#It looks like this when id is specified
sample = soup.find(id='template-embed-sample')
#Get attribute value
#Get the link destination of the a tag
link = a.attrs['href']
BeautifulSoup object obtained by find method? Because it has the information of the contained child You can also get the following
<div><span>hogehoge</span><div>
to get hogehoge
div = soup.find('div')
span = div.find('span')#Find the span in the div
print(span.text)
Recommended Posts