import lxml.etree
#Read the file
tree = lxml.etree.parse('rss2.xml')
#getroot()Corresponds to the XML root element in the method
root = tree.getroot()
#xpath()Get a list of elements that match the XPath in the method
for item in room.xpath('Specify elements hierarchically'):
#Specify the data you want to acquire and acquire
You can scrape without being aware of the RSS format. There are multiple formats for RSS feeds, such as RSS1.0, RSS2.0, and Atom, which is annoying, so I recommend it.
import feedparser
d = feedparser.parse('https://qiita.com/tags/docker/feed')
for entry in d.entries:
print(entry.link, entry.title)
It's so easy ... You can run it regularly and make your own RSS reader. ..
Recommended Posts