It scrapes using a library called jsoup and writes specific elements to a file. At that time, export in JSON format. (Works with eclipse)
Fortunately (?) I came across a service that has many partners when I was looking for a service partner. I thought it would be difficult to copy these service names and put them in JSON format {" Name ":" Service_name "," Connectivity ":" 1 "}
.
The service handled this time is Highrise cooperation destination.
Use a library called ** jsoup ** that can scrape HTML. Official page: http://jsoup.org/ Download page: http://jsoup.org/download
Looking at the HTML, I was able to confirm that the information I want to be next (service name) is in the a tag ** of the h4 tag of the ** app class.
Right-click [package] → [Build Path] → click [Configure Build Path] The following screen is displayed. Click Add External JAR and select the jsoup .jar file you downloaded earlier Then click [Apply and Close]. This completes the addition. Make sure you have a "reference library" that contains the added .jar files.
Main.java
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Describe the above.
Don't forget to handle exceptions.
Main.java
import java.io.FileWriter;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
FileWriter fw = null;
try {
//Writing a file
}catch(IOException e) {
System.out.println("File write error");
}finally {//Close file
if(fw != null) {
try {
if(fw != null) {
fw.close();
}
}catch(IOException e2) {}
}
}
}
}
Main.java
//Open file
fw = new FileWriter("[File path to write]", true);
//Write to file
Document document = Jsoup.connect("https://highrisehq.com/extras/").get();
Elements elements = document.select(".app h4 a");
for (Element element : elements) {
String name = element.text();
fw.write("{\"Name\":\"" + name + "\",\"Connectivity\":\"1\"}\n");
System.out.println(name);
}
fw.flush();
Load the website HTML with Jsoup.connect ("URL "). Get ();
Find the HTML tag you need with document.select ("tag");
. In this case, since there are multiple cases, all the corresponding items are included in elements.
Take out one by one with for
Get the corresponding HTML text with ʻelement.text ();. (In the case of an attribute, ʻattr ("attribute name")
instead of text)
When writing to a file, it is in JSON format, so this time match it with {" Name ":" Service_name "," Connectivity ":" 1 "}
.
I was able to write it out like this. I'm happy.
Thank you very much. jsoup Usage note: https://qiita.com/opengl-8080/items/d4864bbc335d1e99a2d7 Let's scrape with Java! !! : https://qiita.com/takahiroSakamoto/items/c2b269c07e15a04f5861 ■ [Java] [Html Parser] [jsoup] How to use Java library "jsoup" that can operate html like jquery. : http://d.hatena.ne.jp/it-tech-dm/20110123/1295774869
Recommended Posts