I wanted to create a large number of CSV files in order to load data in file
a large amount of data to MySQL. I investigated there.
Select from CSV libraries that can process Java objects with annotations one by one.
Maven Repository: com.orangesignal » orangesignal-csv » 2.2.1 Maven Repository: com.github.mygreen » super-csv-annotation » 2.2 Maven Repository: com.univocity » univocity-parsers » 2.8.1
Among them, Super CSV Annotation was considerably slower (about 3 times slower than the others). OrangeSignal CSV was the fastest.
If you don't need escaping, it was the fastest to make your own without escaping. ↓ A self-made CSV library that does not escape. NonEscapedCsvWriter/src/com/github/momosetkn/csv at master · momosetkn/NonEscapedCsvWriter
Created 100,000 CSVs, no enclosing characters to reduce data volume. Created on the condition that it is separated by commas and has a header.
Execution environment
NonEscapedCsvWriter/src/csv/test at master · momosetkn/NonEscapedCsvWriter
booleanField,localDateField,localDateTimeField,bigDecimalField,stringField,integerField,longField
true,2018-12-25,2018-12-25T09:30:10,100000.000001,1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456,2147483647,9223372036854775807
true,2018-12-25,2018-12-25T09:30:10,100000.000001,1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456,2147483647,9223372036854775807
(All the data is the same, but 100,000 cases continue ...)
CSV library | processing time |
---|---|
OrangeSignal CSV | 451.670,801ms |
Self-made CSV library that does not escape | 389.026,236ms |
SuperCsvAnnotation | 1,488.254,510ms |
univocity-parsers | 523.735,942ms |
Hmm ... Data editing is not so heavy compared to disk access, I thought that none of the libraries would change to an order of magnitude level, SuperCsvAnnotation takes three times as long as other CSV libraries ... I'm curious for some reason ...
↓ Code NonEscapedCsvWriter/NonEscapedCsvWriter.java at master · momosetkn/NonEscapedCsvWriter
Changed to concatenate strings using StringBuilder. ↓ Reference materials [String type vs StringBuilder] Difference in processing speed in string concatenation -Qiita How fast to write strings in Java? -Qiita
Enabled to specify capacity
of StringBuilder first.
It can be set by the library user according to the expected amount of data.
It is not synchronized because it does not assume a multithreaded environment.
Although it has nothing to do with speeding up, I made it easy to write the conversion logic to a character string with lambda.
Or rather sample code
try(NonEscapedCsvWriter<ExampleBean> exampleBeanCsvWriter = new NonEscapedCsvWriter<>(ExampleBean.class,
Files.newBufferedWriter(new File("/home/momose/Documents/test1_p.csv").toPath(), Charsets.UTF_8))
.charsCapacity(334)
.convert((input)-> {//Conversion logic to string
if (input instanceof LocalDateTime) {
return ((LocalDateTime) input).format(dateTimeFormatter);
} else if (input instanceof Boolean) {
return Boolean.TRUE.equals(input) ? "1" : "0";
} else if (input == null) {
return "null";
}
return input.toString();
})){
exampleBeanCsvWriter.init();
for (int i = 0; i < MAX_RECORD_COUNT; i++) {
ExampleBean exampleBean = new ExampleBean();
//Data editing process ...
exampleBeanCsvWriter.write(exampleBean);
}
};
The rest is imported with this command.
LOAD DATA INFILE 'example.csv' INTO TABLE scheme.table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;
Recommended Posts