I referred to the following article.
Introduction to Python Web Scraping Practice BeautifulSoup4 cheat sheet (selector, etc.)
The data we are aiming for this time is the trend on the top page of Qiita.
in this way
<div data-hyper-app ='Trend' data-hyper-props ='trend json data'>
Since the trend JSON data exists in the form of, the goal is to acquire it.
We use a library called Beautiful Soup
.
python
import urllib.request
from bs4 import BeautifulSoup
import json
QIITA_TOP_URL = 'https://qiita.com/'
def get_trend_items():
req = urllib.request.Request(QIITA_TOP_URL)
with urllib.request.urlopen(req) as res:
body = res.read()
soup = BeautifulSoup(body, "html.parser")
target_div = soup.select('div[data-hyperapp-app="Trend"]')[0]
trend_items = json.loads(target_div.get('data-hyperapp-props'))
return trend_items
Recommended Posts