Review and implementation of CSV library for loading large amounts of data into MySQL (Java)

Why this article?

I wanted to create a large number of CSV files in order to load data in file a large amount of data to MySQL. I investigated there.

Premise

Select from CSV libraries that can process Java objects with annotations one by one.

result

Maven Repository: com.orangesignal » orangesignal-csv » 2.2.1 Maven Repository: com.github.mygreen » super-csv-annotation » 2.2 Maven Repository: com.univocity » univocity-parsers » 2.8.1

Among them, Super CSV Annotation was considerably slower (about 3 times slower than the others). OrangeSignal CSV was the fastest.

If you don't need escaping, it was the fastest to make your own without escaping. ↓ A self-made CSV library that does not escape. NonEscapedCsvWriter/src/com/github/momosetkn/csv at master · momosetkn/NonEscapedCsvWriter

Method of verification

Created 100,000 CSVs, no enclosing characters to reduce data volume. Created on the condition that it is separated by commas and has a header.

Execution environment

Verification code (see Github)

NonEscapedCsvWriter/src/csv/test at master · momosetkn/NonEscapedCsvWriter

Write data

booleanField,localDateField,localDateTimeField,bigDecimalField,stringField,integerField,longField
true,2018-12-25,2018-12-25T09:30:10,100000.000001,1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456,2147483647,9223372036854775807
true,2018-12-25,2018-12-25T09:30:10,100000.000001,1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456,2147483647,9223372036854775807
(All the data is the same, but 100,000 cases continue ...)

result

CSV library processing time
OrangeSignal CSV 451.670,801ms
Self-made CSV library that does not escape 389.026,236ms
SuperCsvAnnotation 1,488.254,510ms
univocity-parsers 523.735,942ms

Hmm ... Data editing is not so heavy compared to disk access, I thought that none of the libraries would change to an order of magnitude level, SuperCsvAnnotation takes three times as long as other CSV libraries ... I'm curious for some reason ...

Ingenuity of CSV library that does not escape processing

↓ Code NonEscapedCsvWriter/NonEscapedCsvWriter.java at master · momosetkn/NonEscapedCsvWriter

Changed to concatenate strings using StringBuilder. ↓ Reference materials [String type vs StringBuilder] Difference in processing speed in string concatenation -Qiita How fast to write strings in Java? -Qiita

Enabled to specify capacity of StringBuilder first. It can be set by the library user according to the expected amount of data.

It is not synchronized because it does not assume a multithreaded environment.

Although it has nothing to do with speeding up, I made it easy to write the conversion logic to a character string with lambda.

How to use a CSV library that does not escape

Or rather sample code

try(NonEscapedCsvWriter<ExampleBean> exampleBeanCsvWriter = new NonEscapedCsvWriter<>(ExampleBean.class,
                Files.newBufferedWriter(new File("/home/momose/Documents/test1_p.csv").toPath(), Charsets.UTF_8))
.charsCapacity(334)
.convert((input)-> {//Conversion logic to string
    if (input instanceof LocalDateTime) {
        return ((LocalDateTime) input).format(dateTimeFormatter);
    } else if (input instanceof Boolean) {
        return Boolean.TRUE.equals(input) ? "1" : "0";
    } else if (input == null) {
        return "null";
    }
    return input.toString();
})){
    exampleBeanCsvWriter.init();
    for (int i = 0; i < MAX_RECORD_COUNT; i++) {
        ExampleBean exampleBean = new ExampleBean();
        //Data editing process ...
        exampleBeanCsvWriter.write(exampleBean);
    }
};

The rest is imported with this command.

LOAD DATA INFILE 'example.csv' INTO TABLE scheme.table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;

Recommended Posts

Review and implementation of CSV library for loading large amounts of data into MySQL (Java)
[Details] Implementation of consumer applications with Kinesis Client Library for Java
How to delete large amounts of data in Rails and concerns
Implementation of clone method for Java Record
Implementation of DBlayer in Java (RDB, MySQL)
Flexible data type conversion mechanism of O / R mapping library Lightsleep for Java 8
Procedure for loading JDBC of MySQL using JAVA-Eclipse
Review notes for Java 1.7 and later file copies
[Java] About Objects.equals () and Review of String comparisons (== and equals)
List of frequently used Java instructions (for beginners and beginners)