I tried scraping a stock chart using Java (Jsoup)

What i did

I used Jsoup to scrape the stock chart of Yahoo! Finance. By the way, the scraped stock chart is Alibaba. The reason I chose Alibaba is simply because I want it to grow. .. ..

Like this

スクリーンショット 2020-05-31 20.06.36.png

code

@Controller
public class YahooFinanceController {
	private static final String YAHOO_FINANCE_URL = "https://stocks.finance.yahoo.co.jp/us/detail/BABA";
	
	@RequestMapping("/")
	public String index(Model model) {
        Document YahooDoc = null;
        String imgSrc = null;
        try {
           YahooDoc = Jsoup.connect(YAHOO_FINANCE_URL).get();
           Elements img = YahooDoc.select("div.styleChart img");
           imgSrc = img.attr("src");
        } catch (IOException e) {
            e.printStackTrace();
        }
	    model.addAttribute("imgSrc", imgSrc);
	    return "index";
	}	
}
<body>
	<img th:src="${imgSrc}" title="Alibaba stock price!" />
</body>

Commentary

What is Document?

Stores the HTML specified in the URL. The method name is connect (). For example, in this case, Document is specified as follows.

Document YahooDoc = Jsoup.connect(YAHOO_FINANCE_URL).get();

When this is displayed on the console, it will bring all the HTML of the specified page! It's amazing.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja"> 
 <head> 
  <title>Alibaba Group Holding [BABA]: Stocks/Stock price- Yahoo!finance</title> 
  <meta charset="utf-8"> 
  <meta name="description" content="Stock price of Alibaba Group Holding [BABA]. Covers stock prices, charts, performance, etc. of all stocks listed on NYSE (New York Stock Exchange) and NASDAQ. ADR Japanese stocks and rankings are also substantial."> 
  <meta name="keywords" content="Stock price,Dow,Nasdaq,America,Ranking"> 
...abridgement...
  <meta property="og:description" content="Stock price of Alibaba Group Holding [BABA]. Covers stock prices, charts, performance, etc. of all stocks listed on NYSE (New York Stock Exchange) and NASDAQ. ADR Japanese stocks and rankings are also substantial."> 
  <meta property="og:title" content="Alibaba Group Holding [BABA]: Stocks/Stock price- Yahoo!finance"> 
...Omitted below

What is Element?

Stores the elements from the Document object obtained above. The method name is select (). For example, in this case, Element is specified as follows.

Elements img = YahooDoc.select("div.styleChart img");

I am getting the img tag in the styleChart class of the div tag. When this is displayed on the console,

<img src="https://chart.yahoo.co.jp/?code=BABA&amp;tm=1d&amp;size=e" alt="Chart image">

How to get only src attribute of fetched img tag?

You can get the value of the attribute specified by attr () from the Element object. This time, the src attribute of the img tag is stored in a variable called imgSrc of String type.

String imgSrc = img.attr("src");

When this is displayed on the console,

https://chart.yahoo.co.jp/?code=BABA&tm=1d&size=e

Finally, the controller passes the value to the view and returns the page.

model.addAttribute("imgSrc", imgSrc);
return "index";

I found that it was surprisingly easy to scrape. that's all. Thank you for reading to the end.

reference

Website scraping by jsoup How to parse for image src using JSOUP?my Thymeleaf conditional img src

Recommended Posts

I tried scraping a stock chart using Java (Jsoup)
I tried using Java REPL
I tried using Log4j2 on a Java EE server
I tried using JWT in Java
I tried using Java memo LocalDate
I tried using GoogleHttpClient of Java
I tried using Elasticsearch API in Java
I tried using OpenCV with Java + Tomcat
I tried to make a talk application in Java using AI "A3RT"
java I tried to break a simple block
I tried hitting a Java method from ABCL
I tried to implement a server using Netty
I tried to break a block with java (1)
I tried running Java on a Mac terminal
I tried using Gson
Scraping practice using Java ②
I tried using TestNG
I tried using Galasa
Scraping practice using Java ①
[Java] I tried to connect using a connection pool with Servlet (tomcat) & MySQL & Java
I tried using a database connection in Android development
I tried to operate SQS using AWS Java SDK
I tried to create a Clova skill in Java
I tried to make a login function in Java
I tried OCR processing a PDF file with Java
Try scraping using java [Notes]
I tried Drools (Java, InputStream)
I tried using Apache Wicket
Make a rhombus using Java
I tried metaprogramming in Java
I tried to create a java8 development environment with Chocolatey
I tried using the GitHub repository as a library server
I tried to modernize a Java EE application with OpenShift.
I tried using Hotwire to make Rails 6.1 scaffold a SPA
I tried to convert a string to a LocalDate type in Java
I tried using Dapr in Java to facilitate microservice development
I tried to make a client of RESAS-API in Java
I tried using the CameraX library with Android Java Fragment
I made a Dockerfile to start Glassfish 5 using Oracle Java
I tried Tribuo published by Oracle. Tribuo --A Java prediction library (v4.0)
I tried using Spring + Mybatis + DbUnit
I created a PDF in Java.
Upload a file using Java HttpURLConnection
I tried a little digdag docker.run_options
A person writing C ++ tried writing Java
[Unity] I tried to make a native plug-in UniNWPathMonitor using NWPathMonitor
I tried to interact with Java
I tried UDP communication with Java
I tried the Java framework "Quarkus"
I tried to summarize Java learning (1)
[Android] I tried using Coordinator Layout.
I tried using Pari gp container
I tried using WebAssembly Stadio (2018/4/17 version)
I tried to summarize Java 8 now
I tried to make a simple face recognition Android application using OpenCV
I tried learning Java with a series that beginners can understand clearly
I tried a calendar problem in Ruby
I tried using Realm with Swift UI
I tried using Java's diagnostic tool Arthas
I tried using UICollectionViewListCell added from Xcode12.
I tried Cassandra's Object Mapper for Java