I used Jsoup to scrape the stock chart of Yahoo! Finance. By the way, the scraped stock chart is Alibaba. The reason I chose Alibaba is simply because I want it to grow. .. ..
@Controller
public class YahooFinanceController {
private static final String YAHOO_FINANCE_URL = "https://stocks.finance.yahoo.co.jp/us/detail/BABA";
@RequestMapping("/")
public String index(Model model) {
Document YahooDoc = null;
String imgSrc = null;
try {
YahooDoc = Jsoup.connect(YAHOO_FINANCE_URL).get();
Elements img = YahooDoc.select("div.styleChart img");
imgSrc = img.attr("src");
} catch (IOException e) {
e.printStackTrace();
}
model.addAttribute("imgSrc", imgSrc);
return "index";
}
}
<body>
<img th:src="${imgSrc}" title="Alibaba stock price!" />
</body>
Stores the HTML specified in the URL. The method name is connect (). For example, in this case, Document is specified as follows.
Document YahooDoc = Jsoup.connect(YAHOO_FINANCE_URL).get();
When this is displayed on the console, it will bring all the HTML of the specified page! It's amazing.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja">
<head>
<title>Alibaba Group Holding [BABA]: Stocks/Stock price- Yahoo!finance</title>
<meta charset="utf-8">
<meta name="description" content="Stock price of Alibaba Group Holding [BABA]. Covers stock prices, charts, performance, etc. of all stocks listed on NYSE (New York Stock Exchange) and NASDAQ. ADR Japanese stocks and rankings are also substantial.">
<meta name="keywords" content="Stock price,Dow,Nasdaq,America,Ranking">
...abridgement...
<meta property="og:description" content="Stock price of Alibaba Group Holding [BABA]. Covers stock prices, charts, performance, etc. of all stocks listed on NYSE (New York Stock Exchange) and NASDAQ. ADR Japanese stocks and rankings are also substantial.">
<meta property="og:title" content="Alibaba Group Holding [BABA]: Stocks/Stock price- Yahoo!finance">
...Omitted below
Stores the elements from the Document object obtained above. The method name is select (). For example, in this case, Element is specified as follows.
Elements img = YahooDoc.select("div.styleChart img");
I am getting the img tag in the styleChart class of the div tag. When this is displayed on the console,
<img src="https://chart.yahoo.co.jp/?code=BABA&tm=1d&size=e" alt="Chart image">
You can get the value of the attribute specified by attr () from the Element object. This time, the src attribute of the img tag is stored in a variable called imgSrc of String type.
String imgSrc = img.attr("src");
When this is displayed on the console,
https://chart.yahoo.co.jp/?code=BABA&tm=1d&size=e
model.addAttribute("imgSrc", imgSrc);
return "index";
I found that it was surprisingly easy to scrape. that's all. Thank you for reading to the end.
Website scraping by jsoup How to parse for image src using JSOUP?my Thymeleaf conditional img src
Recommended Posts