Why this article?

I wanted to create a large number of CSV files in order to load data in file a large amount of data to MySQL. I investigated there.

Premise

Select from CSV libraries that can process Java objects with annotations one by one.

result

Maven Repository: com.orangesignal » orangesignal-csv » 2.2.1 Maven Repository: com.github.mygreen » super-csv-annotation » 2.2 Maven Repository: com.univocity » univocity-parsers » 2.8.1

Among them, Super CSV Annotation was considerably slower (about 3 times slower than the others). OrangeSignal CSV was the fastest.

If you don't need escaping, it was the fastest to make your own without escaping. ↓ A self-made CSV library that does not escape. NonEscapedCsvWriter/src/com/github/momosetkn/csv at master · momosetkn/NonEscapedCsvWriter

Method of verification

Created 100,000 CSVs, no enclosing characters to reduce data volume. Created on the condition that it is separated by commas and has a header.

Execution environment

Ubuntu18.04LTS
Core i7-4770K
RAM12GB --SSD (330 Series SSD SC2CT120A3K5) that has been used for about 5 years and seems to be a little damaged --Run on IntelliJ IDEA

Verification code (see Github)

NonEscapedCsvWriter/src/csv/test at master · momosetkn/NonEscapedCsvWriter

Write data

booleanField,localDateField,localDateTimeField,bigDecimalField,stringField,integerField,longField
true,2018-12-25,2018-12-25T09:30:10,100000.000001,1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456,2147483647,9223372036854775807
true,2018-12-25,2018-12-25T09:30:10,100000.000001,1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456,2147483647,9223372036854775807
(All the data is the same, but 100,000 cases continue ...)

result

CSV library	processing time
OrangeSignal CSV	451.670,801ms
Self-made CSV library that does not escape	389.026,236ms
SuperCsvAnnotation	1,488.254,510ms
univocity-parsers	523.735,942ms

Hmm ... Data editing is not so heavy compared to disk access, I thought that none of the libraries would change to an order of magnitude level, SuperCsvAnnotation takes three times as long as other CSV libraries ... I'm curious for some reason ...

Ingenuity of CSV library that does not escape processing

↓ Code NonEscapedCsvWriter/NonEscapedCsvWriter.java at master · momosetkn/NonEscapedCsvWriter

Changed to concatenate strings using StringBuilder. ↓ Reference materials [String type vs StringBuilder] Difference in processing speed in string concatenation -Qiita How fast to write strings in Java? -Qiita

Enabled to specify capacity of StringBuilder first. It can be set by the library user according to the expected amount of data.

It is not synchronized because it does not assume a multithreaded environment.

Although it has nothing to do with speeding up, I made it easy to write the conversion logic to a character string with lambda.

How to use a CSV library that does not escape

Or rather sample code

try(NonEscapedCsvWriter<ExampleBean> exampleBeanCsvWriter = new NonEscapedCsvWriter<>(ExampleBean.class,
                Files.newBufferedWriter(new File("/home/momose/Documents/test1_p.csv").toPath(), Charsets.UTF_8))
.charsCapacity(334)
.convert((input)-> {//Conversion logic to string
    if (input instanceof LocalDateTime) {
        return ((LocalDateTime) input).format(dateTimeFormatter);
    } else if (input instanceof Boolean) {
        return Boolean.TRUE.equals(input) ? "1" : "0";
    } else if (input == null) {
        return "null";
    }
    return input.toString();
})){
    exampleBeanCsvWriter.init();
    for (int i = 0; i < MAX_RECORD_COUNT; i++) {
        ExampleBean exampleBean = new ExampleBean();
        //Data editing process ...
        exampleBeanCsvWriter.write(exampleBean);
    }
};

The rest is imported with this command.

LOAD DATA INFILE 'example.csv' INTO TABLE scheme.table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;

Review and implementation of CSV library for loading large amounts of data into MySQL (Java)

Why this article?

Premise

result

Method of verification

Verification code (see Github)

Write data

result

Ingenuity of CSV library that does not escape processing

How to use a CSV library that does not escape