Try scraping about 30 lines in Java (CSV output)

What is scraping?

Computer software technology that extracts information from websites. There are various gray zones when scraping, so be careful. Reference: Scraping and Law

Purpose

I want to talk about what you can do with scraping. Here are some concrete examples

--Data collection for machine learning --Data analysis for corporate marketing --Data collection for use in the app

There are various ways to use it.

Source code

scraping.java


//Import statement omitted
public class scraping {
    public static void main(String[] args){
        //File preparation
        PrintWriter p = null;
        try {
            p = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("Output destination path/sample.csv"),"Shift-JIS")));
            //Specify the header
            //You can change it freely here
            //If you want to add more columns,And add the column name as a set
                p.print("Column 1");
                p.print(",");
                p.print("Column 2");
                p.println();
            //This time I'm using it as a column key
                int num = 1;
                Document document = Jsoup.connect("Target url").get();
                //You can set various things such as class name, id name, tag name, etc.
                Elements elements = document.select("Target element");
                //Extract when there are multiple target elements
                for (Element element : elements) {
                    //Set the contents
                    p.print(num);
                    p.print(",");
                    p.print(element.text());
                    p.println();    //new line
                    num++;
                }
               
            } catch (IOException e) {
                    System.out.println(e);
            }finally {
                p.close();
            }
        System.out.println("File output completed!");

    }
}

Method

document.select

Elements elements = document.select ("target elements");

For the target element

--Tags such as body and p --Class name such as .class -ID name such as #id

Can be specified. Also,

There is also a specification method such as the name class of the p tag.

element.text

for (Element element : elements) { // set the contents p.print(num); p.print(","); p.print(element.text()); p.println (); // Line break num++; }

--The contents can be retrieved by using the text method.

Summary

As a result, I was able to realize scraping with java in about 30. I personally like java so I tried it.

Recommended Posts

Try scraping about 30 lines in Java (CSV output)
About Java log output
Try using RocksDB in Java
Let Java segfault in 6 lines
Try scraping using java [Notes]
Try calling JavaScript in Java
Try developing Spresense in Java (1)
Try functional type in Java! ①
Supports 0 drop in CSV output
About abstract classes in java
Try implementing Android Hilt in Java
Try implementing GraphQL server in Java
Read CSV in Java (Super CSV Annotation)
Try running Selenuim 3.141.59 in eclipse (java)
Mixed Western calendar output in Java
Try an If expression in Java
Log output to file in Java
Try running AWS X-Ray in Java
About file copy processing in Java
Try to implement Yubaba in Java
About returning a reference in a Java Getter
Try to implement n-ary addition in Java
Try using the Stream API in Java
[Creating] A memorandum about coding in Java
Output Notes document as XML document in Java
Try using JSON format API in Java
Try calling the CORBA service in Java 11+
Output Date in Java in ISO 8601 extended format
About Records preview added in Java JDK 14
Try making a calculator app in Java
Continued Talk about writing Java in Emacs @ 2018
About the confusion seen in startup Java servers
About the idea of anonymous classes in Java
A story about the JDK in the Java 11 era
Deserialize CSV in Java based on header name
Try to create a bulletin board in Java
About var used in Java (Local Variable Type)
Second decoction: Try an If expression in Java
Try using Sourcetrail (win version) in Java code
Try using GCP's Cloud Vision API in Java
Try using Sourcetrail (macOS version) in Java code
I tried to output multiplication table in Java
Try using the COTOHA API parsing in Java
[Java] Something is displayed as "-0.0" in the output
Compare PDF output in Java for snapshot testing
About Java interface
[Java] About Java 12 features
Partization in Java
[Java] About arrays
Try Java 8 Stream
Changes in Java 11
Rock-paper-scissors in Java
Something about java
Where about java
About Java features
About Java threads
[Java] About interface
About Java class
About Java arrays
[Output] About each
About java inheritance