I want to write quickly from java to sqlite

Overview

I used to save tweet data collection in sqlite using python, A memo that I researched variously because I wanted to do it at high speed while multithreading with java.

Library

When dealing with sqlite sqlite-jdbc is easy to use, so I fetch it from maven and use it.

I was addicted to

Initially, I was collecting data on the HDD,

** Anyway slow. ** **

I wondered why it was so slow, and it took a lot of time to add DBs rather than collecting them.

Solution

I searched for the cause.

  1. When adding data, it was added one by one with ʻexecuteUpdate`
  2. ʻINSERT OR UPDATE` to avoid duplication of data (tweets)
  3. In the default setting, it is written to the db-jornal file and then added to the DB file, so extra disk access occurs. Also, it is slow because it has a synchronization flag with DB.

Solve one by one

  1. to the appropriate library can be reserved or to insert in advance what kind of things as data in the prepareStatement
  2. Regarding this, it seems that ʻUPDATE takes a long time, so change it to ʻINSERT OR IGNORE.
  3. Play around with the pragma mode. Specifically, set journal_mode to MEMORY and set sync_mode to ʻOFF`.

Implementation example

Example of saving the tweet that was said first

DataBase.java


    /**
     *Driver registration
     */
    public static void dbInit() {
        try {
            Class.forName("org.sqlite.JDBC");
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
    }
    /**
     *pragma settings
     *Here 3.Jornal_mode and sync_mode is set
     */
    public static Properties getProperties() {
        Properties prop = new Properties();
        prop.put("journal_mode", "MEMORY");
        prop.put("sync_mode", "OFF");
        return prop;
    }
    /**
     *Process to add to database
     *If you do not put synchronized, deadlock will occur when writing hard with multithread
     */
    public static synchronized void putTweet2SQL(File dbFile, List<Status> tweet) {
        Statement stmt;
        String dbHeader = "jdbc:sqlite:" + dbFile.getAbsolutePath();
        PreparedStatement pstmt;
        dbInit();
        try (Connection conn = DriverManager.getConnection(dbHeader, getProperties())) { //try-with-resources
            conn.setAutoCommit(false);
            stmt = conn.createStatement();
            //Create if you don't have a database
            stmt.executeUpdate("CREATE TABLE IF NOT EXISTS tweets (tweet_id INTEGER PRIMARY KEY, user_id INTEGER, user_screen_name TEXT,tweet_text TEXT)");
            //Tweet ID, user ID, screen name, tweet text
            //2.Preparing to add data in bulk
            pstmt = conn.prepareStatement("INSERT OR IGNORE INTO tweets VALUES (?, ?, ?, ?)");
            for (Status status : tweet) {
                place = status.getPlace().getFullName();
                pstmt.setLong(1, status.getId());
                pstmt.setLong(2, status.getUser().getId());
                pstmt.setString(3, status.getUser().getScreenName());
                pstmt.setString(4, status.getText());
                pstmt.addBatch();//1.Add to processing
            }
            pstmt.executeBatch();//1.The actual database addition process is done here
            conn.commit();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

bonus

Even when reading from the DB, setFetchSize will call the specified number at once, so it will be comfortable to some extent.

DataBase.java


    /**
     *Get Tweets
     *If you pull huge data, the heap will be insufficient, so in that case rewrite appropriately
     * @return Tweet list
     */
    public static List<Status> getTweetsFromSQL(File dbFile) {
        String dbHeader = "jdbc:sqlite:" + dbFile.getAbsolutePath();
        PreparedStatement pstmt;
        List<Status> userDetails = new ArrayList<>();//Appropriate list
        dbInit();
        try (Connection conn = DriverManager.getConnection(dbHeader, getProperties())) {
            pstmt = conn.prepareStatement("SELECT * FROM tweets");
            pstmt.setFetchSize(1000);//If you set 5000 trillion or something, you will get it all at once
            ResultSet rs = pstmt.executeQuery();
            while (rs.next()) {//Tweet ID, user ID, screen name, tweet text
                Status status = new status();
                status.setId(rs.getLong(1));
                status.setUserId(rs.getLong(2));
                status.setScreenName(rs.getString(3));
                status.setTweetText(rs.getString(4));
                
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }
        return userDetails;
    }

Impressions

It's much faster and I'm happy because I could buy more SSDs

Recommended Posts

I want to write quickly from java to sqlite
Run R from Java I want to run rJava
[Java] I want to calculate the difference from the date
[JDBC] I tried to access the SQLite3 database from Java.
I want to write a nice build.gradle
I want to write a unit test!
I want to stop Java updates altogether
I just want to write Java using Eclipse on my Mac
Changes from Java 8 to Java 11
Sum from Java_1 to 100
I want to send an email in Java.
I want to graduate from npm install properly [2020]
I want to use java8 forEach with index
I want to simply write a repeating string
rsync4j --I want to touch rsync in Java.
I want to do something like "cls" in Java
How to write Scala from the perspective of Java
6 features I missed after returning to Java from Scala
I want to transition screens with kotlin and java!
I want to get along with Map [Java beginner]
I want to redirect sound from Ubuntu with xrdp
I want to connect to Heroku MySQL from a client
Migration from Cobol to JAVA
I want to convert characters ...
New features from Java7 to Java8
How to write java comments
Connect from Java to PostgreSQL
Change from SQLite3 to PostgreSQL
From Ineffective Java to Effective Java
I want to build Java Applet without using an IDE
I want to use the Java 8 DateTime API slowly (now)
How to write and notes when migrating from VB to JAVA
I want to implement various functions with kotlin and java!
[Java] I want to test standard input & standard output with JUnit
I want to simplify the conditional if-else statement in Java
In Java, I want to trim multiple specified characters from only the beginning and end.
I want to write a loop that references an index with Java 8's Stream API
[Java] I want to write asynchronous processing using Promise in Java-Trial of Promise-like grammar of JavaScript-
I want to return a type different from the input element with Java8 StreamAPI reduce ()
Studying Java # 6 (How to write blocks)
I want to use FormObject well
I want to convert InputStream to String
I tried to interact with Java
Java to be involved from today
From Java to VB.NET-Writing Contrast Memo-
How to write Java variable declaration
I tried to summarize Java learning (1)
The road from JavaScript to Java
I tried to summarize Java 8 now
[Java] Conversion from array to List
I want to be able to think and write regular expressions myself. ..
I want to return to the previous screen with kotlin and java!
If you want to change the Java development environment from Eclipse
[Android] I want to get the listener from the button in ListView
What I thought about when I started migrating from Java to Kotlin
I want to develop a web application!
I want to eliminate duplicate error messages
I want to make an ios.android app
I tried to summarize Java lambda expressions
I want to display images with REST Controller of Java and Spring!
I want to use DBViewer with Eclipse 2018-12! !!