The procedure when inserting Eijiro's dictionary data into an Oracle table is described. This time, I will extract the word information from the HTML file created last time and insert it in the DB.
If you simply write what you want to do
WordInserter.java
package word;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.jsoup.Jsoup;
import word.sql.WordDictionarySql;
public class WordInserter {
private static String WORD_DICTIONARY =
"C:/Users/*****/Desktop/puisan_102.txt";
private static int LIMMIT = 10000;
public static void main(String args[])throws Exception{
long start = System.currentTimeMillis();
System.out.println("List creation process starts");
List <Object[]>list =getDictionarySource()
.stream().collect(Collectors.toList());
long end = System.currentTimeMillis();
System.out.println("List creation process is completed");
System.out.println(String.format("processing time%d seconds",(end-start)/1000));
start = System.currentTimeMillis();
System.out.println("Start DB processing");
OperateDB.createConnection();
operateList(list);
end = System.currentTimeMillis();
System.out.println("DB creation process is complete");
System.out.println(String.format("processing time%d seconds",(end-start)/1000));
OperateDB.closeConnection();
}
//A function that uses Jsoup to parse HTML tags and list the results
public static List<Object[]> getDictionarySource(){
try{
return Jsoup.parse(new File(WORD_DICTIONARY),"UTF-8")
//Get all child elements belonging to the tag tr
.getElementsByTag("tr")
.stream()
//Gets the text of the child element belonging to the tag tr and
//Data structures that can be inserted into the DB using the map function
//(In this case Object[])Convert to
.map(s->new DictionaryWord(
s.getAllElements().eachText()).params)
//List so that it can be inserted by batch of QueryLinner<Object[]>
//Aggregate to
.collect(Collectors.toList());
}catch(Exception e){
e.printStackTrace();
return new ArrayList<Object[]>();
}
}
//Execute Insert processing in units of 10,000 for the List passed as an argument
public static void operateList(List<Object[]> word){
Stream.iterate(1, s->s+1)
.limit((int)Math.ceil(word.size()/LIMMIT))
.forEach(i->executeSQL(word,i));
}
public static void executeSQL(List<Object[]>list,int from){
if(list.size()<((from+1)*LIMMIT))
WordDictionarySql.insertToWordDictionary(
list.subList(from*LIMMIT, list.size()));
else
WordDictionarySql.insertToWordDictionary(
list.subList((from-1)*LIMMIT, from*LIMMIT));
}
public static class DictionaryWord{
public long wordIndex;
public int wordLevel;
public String word;
public String searchWord;
public String meaning;
public String shortVer;
public Object[] params=new Object[6];
public DictionaryWord(List<String>list){
this.wordIndex = Long.parseLong(list.get(1));
params[0] = this.wordIndex;
this.wordLevel = Integer.parseInt(list.get(3).replaceAll("■", ""));
params[1] = this.wordLevel;
this.word = list.get(2);
params[2] = this.word;
this.searchWord = list.get(2).toLowerCase();
params[3] = this.word;
this.meaning = list.get(4);
params[4] = this.meaning;
this.shortVer = list.get(4).length()>1000?
list.get(4).substring(0,1000):list.get(4);
params[5] = this.shortVer;
}
public String toString(){
String format = "wordIndex=%d wordLevel=%d "
+ "word=%s meaning=%s shortVer=%s ";
return String.format(format,this.wordIndex,this.wordLevel,
this.word,this.meaning,this.shortVer);
}
}
}
I am surprised that the DOM created as a result of Jsoup's analysis and the Stream function of Java 8 are too compatible. It wouldn't be great without the Stream function, but maybe I didn't want to implement it in Java (In that case, I think I was using ruby)
Just pass the list of List \ <Ojbect [] > given as an argument to OperateDB # insert created in Previous article.
WordDictionarySql.java
package word.sql;
import java.util.List;
import word.OperateDB;
public class WordDictionarySql {
public static String INSERT_SQL = "insert into WORD_DICTIONAY "
+ "(WORD_INDEX,WORD_LEVEL,WORD,SEARCH_WORD,MEANING,SHORT_VER) values(?,?,?,?,?,?)";
public static void insertToWordDictionary(List<Object[]> word){
try{
OperateDB.insert(word,INSERT_SQL);
}catch(Exception e){
e.printStackTrace();
throw new RuntimeException("An error occurred during DB insert processing");
}
}
}
I put about 330,000 data into the Oracle table, but the execution time was about 102 seconds. By the way, when autocommit was enabled, the execution time was about twice as long as 230 seconds. From the top of the numbers, I was able to realize that enabling autocommit will reduce performance.
Recommended Posts