When extracting a specific part of a web page
Python3
import requests
import json
r = requests.get('https://nikkei225jp.com/chart/')
text = r.text #Returns an html object
date = text.split('<div class=wtimeT>')[1].split('</div>')[0]
nikkei = text.split('<div class=if_cur>')[1].split('</div>')[0].replace(',','')
dau = text.split('<div class=if_cur>')[2].split('</div>')[0].replace(',','')
kawase = text.split('<div class=if_cur>')[3].split('</div>')[0].replace(',','')
print('today',date,'is')
print ('Nikkei Stock Average',nikkei, 'It's a yen')
print ('Dow Jones Industrial Average', dau, 'It's a yen')
print ('Currency dollar', kawase,'It's a yen')
a=open('shares.csv','w')
a.write('Date and time,Nikkei Stock Average,Dow Jones Industrial Average,Currency dollar\n')
a.write(date+','+nikkei+','+dau+','+kawase+'\n')
a.close()
Result (command line)
Today is 2019/03/23
Nikkei Stock Average is 21627.It's 34 yen
Dow Jones Industrial Average is 25502.32 yen
Exchange dollar is 109.It's 93 yen
I think it was printed like this
Results (shares.csv)
Date and time,Nikkei Stock Average,Dow Jones Industrial Average,Currency dollar
2019/03/23,21627.34,25502.32,109.93
I confirmed that a file like this has been created.
What the program did
Of this Nikkei Stock Average *** Date and time, Nikkei Stock Average, Dow Jones Industrial Average, Forex Dollar *** Information such as was extracted, printed, and saved.
Quote: https://nikkei225jp.com/chart/Of this page I'm extracting the information of this part
Web scraping also has a convenient way to use *** Beautiful Soup *** or *** Selenium ***
This time, we adopted the primitive method of *** requests *** only ~
As a flow
r = requests.get('URL of the page you want to scrape')
The response (page information) returned in is stored in the variable *** r ***
text = r.text
Getch in text format with the body (HTML body) of the response *** r *** returned in step as *** text ***
For example, *** Nikkei Stock Average ***, *** 21,627.34 ***
If you want to extract
Select the information you want to extract as shown above Search for "** Validate " or " View Page Source **"
<div class="if_cur">21,627.34</div><img width="674" alt="Screenshot 2020-04-14 18.27.54.png " src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/377231/64fddc0e-29c5-8d2a-09ee-bb3656fc0895.png ">
In this way, the class *** if_cur *** is sandwiched.
nikkei = text.split('<div class=if_cur>')[1].split('</div>')[0].replace(',','')
The contents are extracted as *** nikkei *** in this one line.
Example 2) Example of date and time extraction
.replace (',','') just removes the comma (,). Because the comma in the number is an obstacle. (If you want to convert to int type later and perform the operation, you cannot convert to int type if commas remain)
*** Dow Jones Industrial Average ***, *** Currency Dollar ***, *** Time *** is the same way
The most primitive but method that can handle various patterns.
This time, ** extract all the links (URLs) that exist on the Nikkei Stock Average page.
This time, if I draw the code to extract one by one like before, there is no sharpness
For the time being, a pattern that extracts all *** a *** tags (tags with URLs) and stores them in an array
*** Beautiful Soup *** is now
$ pip install beautifulsoup4
I just added the *** a *** tag acquisition code to the previous code
import requests
import json
from bs4 import BeautifulSoup
r = requests.get('https://nikkei225jp.com/chart/')
text = r.text
date = text.split('<div class=wtimeT>')[1].split('</div>')[0]
nikkei = text.split('<div class=if_cur>')[1].split('</div>')[0].replace(',','')
dau = text.split('<div class=if_cur>')[2].split('</div>')[0].replace(',','')
kawase = text.split('<div class=if_cur>')[3].split('</div>')[0].replace(',','')
print('today',date,'is')
print ('Nikkei Stock Average',nikkei, 'It's a yen')
print ('Dow Jones Industrial Average', dau, 'It's a yen')
print ('Currency dollar', kawase,'It's a yen')
a=open('shares.csv','w')
a.write('Date and time,Nikkei Stock Average,Dow Jones Industrial Average,Currency dollar\n')
a.write(date+','+nikkei+','+dau+','+kawase+'\n')
a.close()
#Below is the additional amount
soup = BeautifulSoup(r.text , "html.parser")
for a in soup.find_all('a'):
if 'http' in str(a): #This time it is limited to the a tag with http
#print(a.text) #Contents of a tag (title)
print(a.attrs['href']) #URL
Result (command line)
Today is 2019/03/23
Nikkei Stock Average is 21627.It's 34 yen
Dow Jones Industrial Average is 25502.It's 32 yen
Exchange dollar is 109.It's 93 yen
http://xn--u9jt60g57a227ciso.com/
http://quote.jpx.co.jp/jpx/template/quote.cgi?F=tmp/real_index&QCODE=155
http://klug-fx.jp/holiday/
https://jp.investing.com/holiday-calendar/
https://db.225225.jp/
https://nikkei225jp.com/chart/
https://nikkei225jp.com/nasdaq/
https://nikkei225jp.com/fx/
https://ch225.com/
https://225225.jp/
https://nikkei225jp.com/cme/
https://adr-stock.com/
http://fx.minkabu.jp/indicators/calendar
http://jp.reuters.com/investing/news/economic
http://www3.nhk.or.jp/news/html/20190323/k10011858101000.html
http://moneyzine.jp/article/detail/215915
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/19cDqM88PGE/graphics-frb-idJPKCN1R30VK
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/wqkbZgbeMMA/asia-companies-outlook-analysis-idJPKCN1R30Y2
http://www.asahi.com/articles/ASM3D3S9TM3DULFA00N.html?ref=rss
http://diamond.jp/articles/-/197806
http://www.asahi.com/articles/ASM3R1SPZM3RUHBI003.html?ref=rss
https://zai.diamond.jp/list/fxnews/detail?id=312805&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
https://zai.diamond.jp/list/fxnews/detail?id=312804&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
https://zai.diamond.jp/list/fxnews/detail?id=312803&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
https://zai.diamond.jp/list/fxnews/detail?id=312802&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
https://zai.diamond.jp/list/fxnews/detail?id=312801&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
http://www3.nhk.or.jp/news/html/20190322/k10011857501000.html
http://diamond.jp/articles/-/197800
http://feeds.reuters.com/~r/reuters/JPMarketNews/~3/k7hVYUD0Rlw/usa-trump-russia-idJPL3N21949Q
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/7SX2E12xQqA/ny-market-summary-0322-idJPKCN1R32TP
https://www.nikkei.com/article/DGXLASM7IAA05_T20C19A3000000/
http://feeds.reuters.com/~r/reuters/JPCompanyNews/~3/vP8IyPxDb_w/EU-HUAWEI-TECH--idJPL3N21946T
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/dhPXy0bfxg8/ny-stx-us-idJPKCN1R32SJ
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/f5sMkCorXO8/ny-forex-idJPKCN1R32SB
http://feeds.reuters.com/~r/reuters/JPMarketNews/~3/fYZ-Sat0U3Y/ny-markets-summary-idJPL3N2194BF
http://feeds.reuters.com/~r/reuters/JPCompanyNews/~3/unMMYgBSv38/ny-stx-us-idJPL3N21946R
http://feeds.reuters.com/~r/reuters/JPCompanyNews/~3/FIxyRRMByHY/pinterest-ipo-idJPL3N21949V
https://zai.diamond.jp/list/fxnews/detail?id=312800&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
https://www.nikkei.com/article/DGXLASH2ICE01_T20C19A3000000/
http://www.traders.co.jp/foreign_stocks/market_s.asp#today
http://www.gaitame.com/market/yosoku.html
http://market.fisco.co.jp/update/index.jsp
http://www.traderswebfx.jp/news/default.aspx?ID=7#newslist
http://kabuyoho.ifis.co.jp/
http://www.tokyoipo.com/top/iposche/index.php?j_e=J
http://klug-fx.jp/holiday/
https://jp.investing.com/holiday-calendar/
http://world.honda.com/worldclock/
https://news.yahoo.co.jp/search?p=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&ei=utf-8&fr=news_sw
https://www.google.co.jp/search?hl=ja&gl=jp&tbm=nws&authuser=0&q=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&oq=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&gs_l=news-cc.1.0.43j43i53.2284.2284.0.5545.1.1.0.0.0.0.56.56.1.1.0...0.0...1ac.1.oMorwBF68ss#q=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&hl=ja&gl=jp&authuser=0&tbm=nws&tbs=sbd:1
http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi
http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi
https://www.dukascopy.jp/
You can see that the links in the page are taken.
#print(a.text) #Contents of a tag (title)
If you comment out the part, you can get the title of the *** a *** tag ~
Today is 2019/03/23
Nikkei Stock Average is 21627.It's 34 yen
Dow Jones Industrial Average is 25502.32 yen
Exchange dollar is 109.It's 93 yen
World stock prices.com
http://xn--u9jt60g57a227ciso.com/
east
http://quote.jpx.co.jp/jpx/template/quote.cgi?F=tmp/real_index&QCODE=155
[Klug]
http://klug-fx.jp/holiday/
[Investing]
https://jp.investing.com/holiday-calendar/
Real-time market conditions Parts
https://db.225225.jp/
Nikkei Stock Average
https://nikkei225jp.com/chart/
Dow Jones Industrial Average
https://nikkei225jp.com/nasdaq/
Exchange dollar yen
https://nikkei225jp.com/fx/
World stock price
https://ch225.com/
Mobile phone
https://225225.jp/
CME
https://nikkei225jp.com/cme/
ADR
https://adr-stock.com/
Everyone's exchange
http://fx.minkabu.jp/indicators/calendar
Reuters
http://jp.reuters.com/investing/news/economic
Movement to strengthen life support services by expanding acceptance of foreign human resources
http://www3.nhk.or.jp/news/html/20190323/k10011858101000.html
Bank deposits, bank earnings cycle after the introduction of negative interest rates, which exceeded the same month last year for 149 consecutive months ...
http://moneyzine.jp/article/detail/215915
Angle:Fed dovish shift, positive impact on US households
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/19cDqM88PGE/graphics-frb-idJPKCN1R30VK
focus:Capital investment by Asian companies to decline for the first time in 3 years due to slowdown in China
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/wqkbZgbeMMA/asia-companies-outlook-analysis-idJPKCN1R30Y2
Subsidy system for nuclear power support Ministry of Economy, Trade and Industry aims to establish in 2020
http://www.asahi.com/articles/ASM3D3S9TM3DULFA00N.html?ref=rss
NY market fell sharply on 22nd-Latest stock news
http://diamond.jp/articles/-/197806
NY Dow plunges, 460 dollars depreciation fears of slowing global economy
http://www.asahi.com/articles/ASM3R1SPZM3RUHBI003.html?ref=rss
Risk-averse funds flow in from continued growth, low stock prices and high bond prices
https://zai.diamond.jp/list/fxnews/detail?id=312805&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
NY gold futures continue to grow, risk-averse funds flow in from stock prices and bond prices
https://zai.diamond.jp/list/fxnews/detail?id=312804&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
Concerns that NY crude oil futures will continue to fall and the deterioration of the world economy will worsen
https://zai.diamond.jp/list/fxnews/detail?id=312803&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
NY market trends(End of transaction):Dow 460.19 dollars cheap(Breaking news), Crude oil futures 0.94 dollars cheap
https://zai.diamond.jp/list/fxnews/detail?id=312802&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
Yen against world currencies:Against dollar 0.81%High, against Euro 1.43%High
https://zai.diamond.jp/list/fxnews/detail?id=312801&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
Creating a text for acquiring a new status of residence for foreign human resources Industry group of restaurant companies
http://www3.nhk.or.jp/news/html/20190322/k10011857501000.html
ECB has no intention of issuing digital currency=Director of Melshu [Fisco Bitcoin New ...
http://diamond.jp/articles/-/197800
UPDATE 1-U.S. Special Prosecutor Submits Russian Suspicion Investigation Report No Further Prosecution Proposal
http://feeds.reuters.com/~r/reuters/JPMarketNews/~3/k7hVYUD0Rlw/usa-trump-russia-idJPL3N21949Q
NY Market Summary(The 22nd)
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/7SX2E12xQqA/ny-market-summary-0322-idJPKCN1R32TP
NY yen, repulsion 1 dollar=109.90 yen?Ends at 110.00 yen, the yen strengthens for the first time in a month
https://www.nikkei.com/article/DGXLASM7IAA05_T20C19A3000000/
resend-EXCLUSIVE-European Commission to make data sharing proposal without eliminating Huawei from 5G=Seki ...
http://feeds.reuters.com/~r/reuters/JPCompanyNews/~3/vP8IyPxDb_w/EU-HUAWEI-TECH--idJPL3N21946T
US stock market plunges, global economic downturn intensifies
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/dhPXy0bfxg8/ny-stx-us-idJPKCN1R32SJ
The dollar fell against the yen, and economic concerns increased due to the reversal of US long-term interest rates=NY market
http://feeds.reuters.com/~r/reuters/JPBusinessNews/~3/f5sMkCorXO8/ny-forex-idJPKCN1R32SB
NY Market Summary(The 22nd)
http://feeds.reuters.com/~r/reuters/JPMarketNews/~3/fYZ-Sat0U3Y/ny-markets-summary-idJPL3N2194BF
U.S. stock market=Sudden fall, global economic downturn Anxiety grows stronger
http://feeds.reuters.com/~r/reuters/JPCompanyNews/~3/unMMYgBSv38/ny-stx-us-idJPL3N21946R
UPDATE 1-US Image Sharing Pinterest Apply for IPO, $ 100 Million?
http://feeds.reuters.com/~r/reuters/JPCompanyNews/~3/FIxyRRMByHY/pinterest-ipo-idJPL3N21949V
NY Marquette Digest ・ 22nd Stocks fell sharply ・ Euro fell ・ Lira plunged
https://zai.diamond.jp/list/fxnews/detail?id=312800&utm_source=zaifxrss&utm_medium=rss&utm_term=zaifxnews&utm_campaign=zaifxrss
Chicago Japanese Equity Futures Overview 22nd
https://www.nikkei.com/article/DGXLASH2ICE01_T20C19A3000000/
Schedule
http://www.traders.co.jp/foreign_stocks/market_s.asp#today
Economic indicator schedule
http://www.gaitame.com/market/yosoku.html
Strength materials / notes
http://market.fisco.co.jp/update/index.jsp
VIP remarks
http://www.traderswebfx.jp/news/default.aspx?ID=7#newslist
Settlement schedule
http://kabuyoho.ifis.co.jp/
IPO schedule
http://www.tokyoipo.com/top/iposche/index.php?j_e=J
Market holiday
http://klug-fx.jp/holiday/
Market holiday
https://jp.investing.com/holiday-calendar/
World clock
http://world.honda.com/worldclock/
Yahoo!News "Nikkei 225"
https://news.yahoo.co.jp/search?p=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&ei=utf-8&fr=news_sw
Google News "Nikkei 225"
https://www.google.co.jp/search?hl=ja&gl=jp&tbm=nws&authuser=0&q=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&oq=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&gs_l=news-cc.1.0.43j43i53.2284.2284.0.5545.1.1.0.0.0.0.56.56.1.1.0...0.0...1ac.1.oMorwBF68ss#q=%E6%97%A5%E7%B5%8C%E5%B9%B3%E5%9D%87&hl=ja&gl=jp&authuser=0&tbm=nws&tbs=sbd:1
FISCO
http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi
http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi
https://www.dukascopy.jp/
You extracted the URL and title part like this.
There may be a link part of *** http *** but no link title.
Recommended Posts