-[Python beginners tried data science] Data acquisition from API [day1]
For the time being, the series continues. I plan to continue writing the acquired data analysis in the future. (plans)
$ python yahoo_news_rss.py
so,
Yahoo!News topics-Top
http://dailynews.yahoo.co.jp/fc/world/nuclear_weapons/?id=6160968
NPT broke document could not be adopted and closed
http://dailynews.yahoo.co.jp/fc/domestic/injury_case/?id=6160975
Family cut off, 4 dead and injured man arrested
http://dailynews.yahoo.co.jp/fc/economy/coffee_chain/?id=6160962
Over 1,000 people at Starbucks Tottori Open
http://dailynews.yahoo.co.jp/fc/economy/apparel/?id=6160961
Burberry is doing well, ending the contract in June
http://dailynews.yahoo.co.jp/fc/science/life_on_earth/?id=6160969
Is the concept of "zero" recognized by monkeys?
http://dailynews.yahoo.co.jp/fc/sports/prowrestling/?id=6160973
First Tiger Mask Emergency Surgery
http://dailynews.yahoo.co.jp/fc/entertainment/broad_casting/?id=6160945
Repulsion to live broadcast Yuko Ando confused
http://dailynews.yahoo.co.jp/fc/domestic/obituary/?id=6160936
Maruyama, an idol of fighting illness, dies
I will explain how to use RSS as simple as it appears.
RSS is a general term for several document formats for easily compiling and distributing updated information on various websites such as news and blogs.
From wiki. In other words, it feels like it's a hassle to scrape the real thing, but it's easy to put together for information acquisition.
Use feed parser to analyze RSS feeds.
So, install feed parser [this](http: // otiai10.hatenablog.com/entry/2012/05/04/180950)が参考になった。
As you can see by referring to Documentation, let's hold down the following two keys.
entries
A list of dictionaries. Each dictionary contains data from a different entry. Entries are listed in the order in which they appear in the original feed.
feed
A dictionary of data about the feed.
For each internal key, refer to Documentation for a list.
yahoo provides RSS via [like this](http://headlines.yahoo.co.jp/rss/list http://headlines.yahoo.co.jp/rss/list). So, the following is to parse the XML returned by hitting this URL into the dictionary.
yahoo_news_rss.py
import feedparser
RSS_URL = "http://rss.dailynews.yahoo.co.jp/fc/rss.xml"
yahoo_news_dic = feedparser.parse(RSS_URL)
The news is now in the dictionary. Look at the feedparser documentation to see what the structure is.
So below is where the entire title and the title and link of each article are displayed.
yahoo_news_rss.py
print yahoo_news_dic.feed.title
for entry in yahoo_news_dic.entries:
title = entry.title
link = entry.link
print link
print title
And as it was at the beginning
Yahoo!News topics-Top
http://dailynews.yahoo.co.jp/fc/world/nuclear_weapons/?id=6160968
NPT broke document could not be adopted and closed
http://dailynews.yahoo.co.jp/fc/domestic/injury_case/?id=6160975
Family cut off, 4 dead and injured man arrested
http://dailynews.yahoo.co.jp/fc/economy/coffee_chain/?id=6160962
Over 1,000 people at Starbucks Tottori Open
http://dailynews.yahoo.co.jp/fc/economy/apparel/?id=6160961
Burberry is doing well, ending the contract in June
http://dailynews.yahoo.co.jp/fc/science/life_on_earth/?id=6160969
Is the concept of "zero" recognized by monkeys?
http://dailynews.yahoo.co.jp/fc/sports/prowrestling/?id=6160973
First Tiger Mask Emergency Surgery
http://dailynews.yahoo.co.jp/fc/entertainment/broad_casting/?id=6160945
Repulsion to live broadcast Yuko Ando confused
http://dailynews.yahoo.co.jp/fc/domestic/obituary/?id=6160936
Maruyama, an idol of fighting illness, dies
It comes out with a feeling.
As mentioned above, the source code is also here.
Recommended Posts