When I was investigating the performance when performing batch processing with JPA (Hibernate), when GenerationType.IDENTITY is used as the primary key generation method on the Batch processing page of Hibernate Documentation, batch processing is disabled at the JDBC level. I saw the description (the relevant part is quoted below), so I tried to find out how much difference there is due to the difference in Generation Type.
Chapter 15. Batch processing Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
We also investigated the case where the same batch processing was performed by JDBC without using JPA as a reference record. (Using Spring Jdbc Template)
environment
reference
Performance verification was done by measuring the execution time of JUnit Unit tests. The Unit test code was created in the environment of Spring Boot and Spring Data JPA, and local PostgreSQL and MySQL were used for the database. (Additionally, this validation is not about which database is faster.)
** for PostgreSQL **
application.yml
spring:
datasource:
url: jdbc:postgresql://localhost:5432/sample_db
username: test_user
password: test_user
driverClassName: org.postgresql.Driver
** For MySQL **
application.yml
spring:
datasource:
url: jdbc:mysql://localhost:3306/sample_db
username: test_user
password: test_user
driverClassName: com.mysql.jdbc.Driver
@Configuration
@EnableTransactionManagement
@EnableJpaRepositories(
basePackages = {"com.example.domain.repository"},
entityManagerFactoryRef = "entityManagerFactory",
transactionManagerRef = "transactionManager"
)
public class DataSourceConfigure {
@ConfigurationProperties(prefix = "spring.datasource")
@Bean
public DataSource datasource() {
DataSource dataSource = DataSourceBuilder.create().build();
return dataSource;
}
@Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory(
EntityManagerFactoryBuilder builder) {
LocalContainerEntityManagerFactoryBean factory = builder
.dataSource(datasource())
.persistenceUnit("default")
.packages("com.example.domain.entity")
.build();
return factory;
}
@Bean
public PlatformTransactionManager transactionManager(
EntityManagerFactory entityManagerFactory) {
JpaTransactionManager tm = new JpaTransactionManager();
tm.setEntityManagerFactory(entityManagerFactory);
tm.setDefaultTimeout(300);
tm.afterPropertiesSet();
return tm;
}
}
First, 100,000 pieces of dummy data were generated and used for insert processing, and the time taken from the start of the test to the normal end was measured. I also used JUnit's Stopwatch rule to measure the time.
@RunWith(SpringRunner.class)
@SpringBootTest
@ContextConfiguration(classes = {
TestApplication.class, DataSourceConfigure.class})
public class MemoWithEntityManagerTest {
@Rule
public final Stopwatch stopwatch = StopwatchRule.create();
@PersistenceContext
private EntityManager entityManager;
@Value("${spring.jpa.properties.hibernate.jdbc.batch_size:30}")
private int batchSize;
private List<Memo> lists;
@Before
public void setup() {
lists = new ArrayList<>();
for (int i=0; i<100_000; i++) {
String title = UUID.randomUUID().toString();
String description = i + " : " + title;
lists.add(Memo.of(title, description));
}
}
@Test
@Transactional
public void testPersist() {
for (int i=0; i<lists.size(); i++) {
entityManager.persist(lists.get(i));
if (i > 0 && i % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
}
}
@RunWith(SpringRunner.class)
@SpringBootTest
@ContextConfiguration(classes = {
TestApplication.class, DataSourceConfigure.class})
public class MemoWithJdbcTemplateTest {
@Rule
public final Stopwatch stopwatch = StopwatchRule.create();
@Autowired
private JdbcTemplate jdbcTemplate;
private List<Memo> lists;
@Before
public void setup() {
lists = new ArrayList<>();
for (int i=0; i<100_000; i++) {
String title = UUID.randomUUID().toString();
String description = i + " : " + title;
lists.add(Memo.of(title, description));
}
}
@Test
@Transactional
public void testBatchUpdate() {
String sql = "insert into memo (title, description, done, updated) values (?, ?, ?, ?)";
BatchPreparedStatementSetter setter = new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Memo memo = lists.get(i);
ps.setString(1, memo.getTitle());
ps.setString(2, memo.getDescription());
ps.setBoolean(3, memo.getDone());
ps.setTimestamp(4, new java.sql.Timestamp(memo.getUpdated().getTime()));
}
@Override
public int getBatchSize() {
return lists.size();
}
};
jdbcTemplate.batchUpdate(sql, setter);
}
}
public class StopwatchRule {
public static Stopwatch create() {
return new Stopwatch() {
@Override
public long runtime(TimeUnit unit) {
return super.runtime(unit);
}
@Override
protected void succeeded(long nanos, Description description) {
System.out.println(description.getMethodName() + " succeeded, time taken " + toSeconds(nanos));
}
@Override
protected void failed(long nanos, Throwable e, Description description) {
super.failed(nanos, e, description);
}
@Override
protected void skipped(long nanos, AssumptionViolatedException e, Description description) {
super.skipped(nanos, e, description);
}
@Override
protected void finished(long nanos, Description description) {
System.out.println(description.getMethodName() + " finished, time taken " + toSeconds(nanos));
}
private double toSeconds(long nanos) {
return (double) nanos / 1000000000.0;
}
};
}
}
It is a pattern that is executed as a batch process if you specify the batch size.
table definition
Use sequence to generate the primary key.
CREATE TABLE memo (
id bigint NOT NULL,
title varchar(255) NOT NULL,
description text NOT NULL,
done boolean DEFAULT false NOT NULL,
updated timestamp without time zone DEFAULT current_timestamp NOT NULL,
CONSTRAINT memo_pkey PRIMARY KEY(id)
);
CREATE SEQUENCE memo_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1
NO CYCLE;
** Entity class **
package com.example.domain.entity;
@Entity
@Table(name="memo")
public class Memo implements Serializable {
private static final long serialVersionUID = -7888970423872473471L;
@Id
@SequenceGenerator(name = "memo_id_seq", sequenceName = "memo_id_seq", allocationSize = 1)
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "memo_id_seq")
private Long id;
@Column(name="title", nullable = false)
private String title;
@Column(name="description", nullable = false)
private String description;
@Column(name="done", nullable = false)
private Boolean done;
@Column(name="updated", nullable = false)
@Temporal(TemporalType.TIMESTAMP)
private Date updated;
public static Memo of(String title, String description) {
Memo memo = new Memo();
memo.title = title;
memo.description = description;
memo.done = false;
memo.updated = new Date();
return memo;
}
// getter/setter omitted
}
application.yml
spring:
jpa:
properties:
hibernate:
generate_statistics: true
batch_size was measured in 2 patterns of 30 and 50.
application.yml
spring:
jpa:
properties:
hibernate:
generate_statistics: true
jdbc:
batch_size: 30
The measurements were taken twice each.
No | Batch size | processing time(Seconds) First time |
processing time(Seconds) Second time |
Remarks |
---|---|---|---|---|
1 | No setting | 43.603787094 | 43.645273886 | Flush every 30 cases using EntityManager,Execute clear |
2 | Batch size 30 | 35.651785986 | 35.052646148 | Flush every 30 cases using EntityManager,Execute clear |
3 | Batch size 50 | 35.159041777 | 34.690860229 | Flush every 50 cases using EntityManager,Execute clear |
Whether or not it is being executed as a batch process was determined from the statistical information output by JPA. Set the generate_statistics parameter to true to output this information.
** No1 first stats **
The reason why the JDBC statement is executed 200,000 times for 100,000 data inserts is because the primary key value is acquired by select nextval ('memo_id_seq')
.
StatisticalLoggingSessionEventListener : Session Metrics {
1535888 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
454642612 nanoseconds spent preparing 200000 JDBC statements;
38436776034 nanoseconds spent executing 200000 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
24964533986 nanoseconds spent executing 3334 flushes (flushing a total of 100000 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
** No2 1st stats **
StatisticalLoggingSessionEventListener : Session Metrics {
2525719 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
289639147 nanoseconds spent preparing 103334 JDBC statements;
16091539786 nanoseconds spent executing 100000 JDBC statements;
15787659766 nanoseconds spent executing 3335 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
16543820527 nanoseconds spent executing 3334 flushes (flushing a total of 100000 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
** No3 1st stats **
StatisticalLoggingSessionEventListener : Session Metrics {
1430965 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
308382598 nanoseconds spent preparing 102000 JDBC statements;
16036841752 nanoseconds spent executing 100000 JDBC statements;
15643983778 nanoseconds spent executing 2001 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
16309985446 nanoseconds spent executing 2000 flushes (flushing a total of 100000 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
This pattern is not executed as a batch process even if the batch size is specified. To generate a primary key with GenerationType.IDENTITY, use bigserial as the table's primary key type.
table definition
Use the bigserial type to generate the primary key.
CREATE TABLE memo (
id bigserial NOT NULL,
title varchar(255) NOT NULL,
description text NOT NULL,
done boolean DEFAULT false NOT NULL,
updated timestamp without time zone DEFAULT current_timestamp NOT NULL,
CONSTRAINT memo_pkey PRIMARY KEY(id)
);
** Entity class **
package com.example.domain.entity;
@Entity
@Table(name="memo")
public class Memo implements Serializable {
private static final long serialVersionUID = -7888970423872473471L;
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(name="title", nullable = false)
private String title;
@Column(name="description", nullable = false)
private String description;
@Column(name="done", nullable = false)
private Boolean done;
@Column(name="updated", nullable = false)
@Temporal(TemporalType.TIMESTAMP)
private Date updated;
public static Memo of(String title, String description) {
Memo memo = new Memo();
memo.title = title;
memo.description = description;
memo.done = false;
memo.updated = new Date();
return memo;
}
// getter/setter omitted
}
The settings are the same as measurement pattern 1. Since the batch size setting is invalidated for GenerationType.IDENTITY, the pattern with the batch size setting is not measured.
No | Batch size | processing time(Seconds) First time |
processing time(Seconds) Second time |
Remarks |
---|---|---|---|---|
1 | No setting | 33.262405883 | 33.097327786 | Flush every 30 cases using EntityManager,Execute clear |
2 | No setting | 16.062690625 | 15.846104684 | Reference record using JdbcTemplate |
This pattern, like pattern 2, is not executed as a batch process. Since MySQL does not have the same function as Sequence of PostgreSQL, GenerationType.SEQUENCE could not be verified.
table definition
CREATE TABLE IF NOT EXISTS memo (
id BIGINT NOT NULL AUTO_INCREMENT,
title VARCHAR(255) NOT NULL,
description TEXT NOT NULL,
done BOOLEAN NOT NULL DEFAULT FALSE NOT NULL,
updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (id)
) ENGINE = INNODB, CHARACTER SET = utf8mb4, COLLATE utf8mb4_general_ci;
** Entity class **
package com.example.domain.entity;
@Entity
@Table(name="memo")
public class Memo implements Serializable {
private static final long serialVersionUID = -7888970423872473471L;
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(name="title", nullable = false)
private String title;
@Column(name="description", nullable = false)
private String description;
@Column(name="done", nullable = false)
private Boolean done;
@Column(name="updated", nullable = false)
@Temporal(TemporalType.TIMESTAMP)
private Date updated;
public static Memo of(String title, String description) {
Memo memo = new Memo();
memo.title = title;
memo.description = description;
memo.done = false;
memo.updated = new Date();
return memo;
}
// getter/setter omitted
}
application.yml
spring:
jpa:
properties:
hibernate:
generate_statistics: true
Since the batch size setting is invalidated for GenerationType.IDENTITY, the pattern with the batch size setting is not measured.
No | Batch size | processing time(Seconds) First time |
processing time(Seconds) Second time |
Remarks |
---|---|---|---|---|
1 | No setting | 28.461122403 | 27.934141586 | Flush every 30 cases using EntityManager,Execute clear |
2 | No setting | 16.115392204 | 15.964329785 | Reference record using JdbcTemplate |
This pattern is a pattern in which rewriteBatchedStatements is added to the condition of pattern 3. This parameter is a MySQL-specific JDBC URL parameter, and you may want to improve performance if you enable it.
table definition
Same as measurement pattern 3
** Entity class **
Same as measurement pattern 3
Same content as measurement pattern 3
rewriteBatchedStatements
Add it as a parameter to the JDBC URL as shown below.
spring:
datasource:
url: jdbc:mysql://localhost:3306/sample_db?rewriteBatchedStatements=true
username: test_user
password: test_user
driverClassName: com.mysql.jdbc.Driver
When this parameter is enabled, the insert statement will be rewritten as shown in the example below.
** Example **
Original insert statement
INSERT INTO tbl_name (a,b,c) VALUES (1,2,3);
INSERT INTO tbl_name (a,b,c) VALUES (4,5,6);
INSERT INTO tbl_name (a,b,c) VALUES (7,8,9);
Insert statement after rewriting
INSERT INTO tbl_name (a,b,c) VALUES (1,2,3),(4,5,6),(7,8,9);
The run-time performance is improved by grouping multiple insert statements as shown in this example. However, there are some restrictions on this method, one of which is that the size of the rewritten sql statement must not exceed the size specified by max_allowed_packet.
This setting does not seem to affect the operation of JPA (Hibernate), and there was no improvement in performance when using EntitiManager.
No | Batch size | processing time(Seconds) First time |
processing time(Seconds) Second time |
Remarks |
---|---|---|---|---|
1 | No setting | 28.258041058 | 28.514153595 | Flush every 30 cases using EntityManager,Execute clear |
2 | No setting | 3.553322863 | 3.49220706 | Reference record using JdbcTemplate |
When the primary key generation method is GenerationType.SEQUENCE, it was confirmed that the performance is improved by setting the batch size (measurement pattern 1), but I got the impression that batch processing takes time in JPA as a whole. It was. I thought that it might be better to implement it by a method other than JPA for processing that handles a large amount of data such as batch processing.