[JAVA] Scraping and writing specific elements to a file

things to do

It scrapes using a library called jsoup and writes specific elements to a file. At that time, export in JSON format. (Works with eclipse)

background

Fortunately (?) I came across a service that has many partners when I was looking for a service partner. I thought it would be difficult to copy these service names and put them in JSON format {" Name ":" Service_name "," Connectivity ":" 1 "}. The service handled this time is Highrise cooperation destination.

Library to use

Use a library called ** jsoup ** that can scrape HTML. Official page: http://jsoup.org/ Download page: http://jsoup.org/download

Check which tag on the web page has the information you want

Looking at the HTML, image.png I was able to confirm that the information I want to be next (service name) is in the a tag ** of the h4 tag of the ** app class.

Add the downloaded .jar file to Eclipse

Right-click [package] → [Build Path] → click [Configure Build Path] The following screen is displayed. image.png Click Add External JAR and select the jsoup .jar file you downloaded earlier image.png Then click [Apply and Close]. This completes the addition. Make sure you have a "reference library" that contains the added .jar files. image.png

Writing code

Import jsoup

Main.java


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

Describe the above.

Write file operations

Don't forget to handle exceptions.

Main.java


import java.io.FileWriter;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

 public static void main(String[] args) {
  FileWriter fw = null;
  try {
    //Writing a file
  }catch(IOException e) {
         System.out.println("File write error");
    }finally {//Close file
        if(fw != null) {
            try {
            if(fw != null) {
              fw.close();
            }
            }catch(IOException e2) {}
        }
     }
 }
}

Write the code in the writing part of the file

Main.java


                        //Open file
			fw = new FileWriter("[File path to write]", true);
			//Write to file
			Document document = Jsoup.connect("https://highrisehq.com/extras/").get();
			Elements elements = document.select(".app h4 a");
			for (Element element : elements) {
				String name = element.text();
				fw.write("{\"Name\":\"" + name + "\",\"Connectivity\":\"1\"}\n");
				System.out.println(name);
			}
			fw.flush();

Load the website HTML with Jsoup.connect ("URL "). Get (); Find the HTML tag you need with document.select ("tag"); . In this case, since there are multiple cases, all the corresponding items are included in elements. Take out one by one with for Get the corresponding HTML text with ʻelement.text ();. (In the case of an attribute, ʻattr ("attribute name") instead of text) When writing to a file, it is in JSON format, so this time match it with {" Name ":" Service_name "," Connectivity ":" 1 "}.

I was able to write it out like this. I'm happy. image.png

What I used as a reference

Thank you very much. jsoup Usage note: https://qiita.com/opengl-8080/items/d4864bbc335d1e99a2d7 Let's scrape with Java! !! : https://qiita.com/takahiroSakamoto/items/c2b269c07e15a04f5861 ■ [Java] [Html Parser] [jsoup] How to use Java library "jsoup" that can operate html like jquery. : http://d.hatena.ne.jp/it-tech-dm/20110123/1295774869

Recommended Posts

Scraping and writing specific elements to a file
I want to monitor a specific file with WatchService
How to load a Spring upload file and view its contents
How to read a file and treat it as standard input
[Xcode] How to add a README.md file
The story of forgetting to close a file in Java and failing
[Personal memo] Writing a file using BufferedWriter
How to record JFR (Java Flight Recorder) and output a dump file
Use Stream # collect to retrieve and list only specific fields from a JavaBean list
How to ZIP a JAVA CSV file and manage it in a Byte array
Write to a file using ShiftJIS-Read a file (Kotlin / JVM)
Set the time of LocalDateTime to a specific time
A story about trying to operate JAVA File
To manually deploy Struts2 as a war file
Prepare a scraping environment with Docker and Java
Introduction to Apache Beam (1) ~ Reading and writing text ~
Create a Java Servlet and JSP WAR file to deploy to Apache Tomcat 9 in Gradle
<java> Read Zip file and convert directly to string
How to jump from Eclipse Java to a SQL file
How to make JavaScript work on a specific page
How to delete custom Adapter elements using a custom model
What to do when rails creates a 〇〇 2.rb file
How to hover and click on Selenium DOM elements
How to download a file (Servlet, HTML, Apache, Tomcat)
[Java] How to erase a specific character from a character string
How to convert A to a and a to A using AND and OR in Java
Ruby Regular Expression Extracts from a specific string to a string
How to convert a file to a byte array in Java
21 Load the script from a file and execute it
[Rails] How to load JavaScript in a specific view
Gzip-compress byte array in Java and output to file
[IOS] To allow rotation of only a specific screen
I tried to chew C # (reading and writing files)
I want to get a list of the contents of a zip file and its uncompressed size