Computer software technology that extracts information from websites. There are various gray zones when scraping, so be careful. Reference: Scraping and Law
I want to talk about what you can do with scraping. Here are some concrete examples
--Data collection for machine learning --Data analysis for corporate marketing --Data collection for use in the app
There are various ways to use it.
scraping.java
//Import statement omitted
public class scraping {
public static void main(String[] args){
//File preparation
PrintWriter p = null;
try {
p = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("Output destination path/sample.csv"),"Shift-JIS")));
//Specify the header
//You can change it freely here
//If you want to add more columns,And add the column name as a set
p.print("Column 1");
p.print(",");
p.print("Column 2");
p.println();
//This time I'm using it as a column key
int num = 1;
Document document = Jsoup.connect("Target url").get();
//You can set various things such as class name, id name, tag name, etc.
Elements elements = document.select("Target element");
//Extract when there are multiple target elements
for (Element element : elements) {
//Set the contents
p.print(num);
p.print(",");
p.print(element.text());
p.println(); //new line
num++;
}
} catch (IOException e) {
System.out.println(e);
}finally {
p.close();
}
System.out.println("File output completed!");
}
}
document.select
Elements elements = document.select ("target elements");
For the target element
--Tags such as body and p --Class name such as .class -ID name such as #id
Can be specified. Also,
There is also a specification method such as the name class of the p tag.
element.text
for (Element element : elements) { // set the contents p.print(num); p.print(","); p.print(element.text()); p.println (); // Line break num++; }
--The contents can be retrieved by using the text method.
As a result, I was able to realize scraping with java in about 30. I personally like java so I tried it.
Recommended Posts