Scraping with Beautiful Soup on jupyter notebook.
In [1] Import Beautiful Soup
In[1]
from bs4 import BeautifulSoup
In [2] Store the html of the article you want to scrape in the variable kiji
In[2]
kiji = """<html>
<head>
<title>I posted it on Qiita</title>
</head>
<body>
<p class="title">
<b>Challenge Qiita for output.</b>
</p>
<p class="article">
<b>I will do my best to write an article.</b>
</p>
</body>
</html>"""
Write the html you want to store between "" "and" "".
In [3] Load the html stored in the variable kiji into BeautifulSoup.
In[3]
soup = BeautifulSoup(kiji,"html.parser")
Write BeautifulSoup (variable containing stored html, "parser you want to use"). This time it is (kiji, "html.parser"). Be careful not to forget to enclose the parser in "" or write. Like htmlparser.
Use In [4] soup with prettify to make it easier to see.
In[4]
print(soup.prettify())
By using prettify (), it is layered and easy to see.
In [4] Output result
In[4]
<html>
<head>
<title>
I posted it on Qiita
</title>
</head>
<body>
<p class="title">
<b>
Challenge Qiita for output.
</b>
</p>
<p class="article">
<b>
I will do my best to write an article.
</b>
</p>
</body>
</html>
In [5] Display the title
In[5]
print(soup.html.head.title)
In[5]Output result
<title>I posted it on Qiita</title>
Recommended Posts