Personal notes when using SXSSF Workbook (It is safe to avoid line-by-line access units / You cannot access lines that have already been written in an existing xlsx file)

Overview

When using XSSF Workbook with Apache POI, it is a memory problem that raises my head. XSSF Workbook expands all read and written data into memory. Therefore, if you are not careful when creating or reading large size Excel, it is easy to get OutOfMemoryError.

To address this XSSF Workbook memory overload problem, Apache POI provides an API called SXSSFWorkbook that saves memory consumption by writing all data to a temporary file instead of expanding it into memory. Since it implements the same Workbook interface as the XSSF Workbook,

Workbook book = new XSSFWorkbook();
// ↓
Workbook book = new SXSSFWorkbook();

In many cases, it seems that memory consumption has been saved by replacing only the implementation class, but there are some precautions for using SXXF Workbook.

1. It is safer to avoid line-by-line access units

The memory-saving logic of SXSSFWorkbook, that is, the write logic to a temporary file, "keeps only the windowSize line in memory, and at the moment when you try to make more lines, all the lines before it are written to the temporary file. That is. And you can't access the previous line that was written to the temporary file.

This can be confirmed in the following source code.

try (Workbook book = new SXSSFWorkbook()) {
    Sheet sheet = book.createSheet();

    //Write from the 2nd line to the 1000th line.
    for (int i = 1; i < 1000; i++) {
        sheet.createRow(i).createCell(0).setCellValue(String.valueOf(i));
    }

    //I forgot to write in the second row, so even if I try to take the Row in the second row
    //Cannot write because the return value will be null
    sheet.getRow(1); // => null

    //I forgot to write in the first row, so even if I try to create a Row in the first row
    //exception(*)Will occur and the application will be terminated instead of being unable to write.
    sheet.createRow(0);
} catch (IOException e) {
    e.printStackTrace();
}

The exceptions that occur in the above (*) are as follows.

Exception in thread "main" java.lang.IllegalArgumentException: Attempting to write a row[0] in the range [0,899] that is already written to disk.
	at org.apache.poi.xssf.streaming.SXSSFSheet.createRow(SXSSFSheet.java:131)
	at org.apache.poi.xssf.streaming.SXSSFSheet.createRow(SXSSFSheet.java:65)
	at poi.Main.main(Main.java:25)

For the time being, random access is possible for the rows held in memory. The windowSize that determines the rows to keep in memory can also be changed with the constructor and setter, but I think that the control tends to be complicated and it may create bugs associated with it. Personally, when using the SXSSF Workbook, I think it is better to avoid random access on a line-by-line basis and access sequentially from the top line to the bottom line.

2. Inaccessible lines written in existing xlsx file

You may want to write data to an existing xlsx file using SXSSF Workbook. At this time, it is important to note that __ "Lines written in the existing xlsx file are written to the temporary file and cannot be accessed by the SXSSF Workbook". __

For example, suppose you have an excel file 2-1000.xlsx that has been written from line 2 to line 1000. Read this and try to access the written line.

try (Workbook book = new SXSSFWorkbook(new XSSFWorkbook("2-1000.xlsx"))) {
    Sheet sheet = book.getSheetAt(0);

    //I forgot to write in the second row, so even if I try to take the Row in the second row
    //Cannot write because the return value will be null
    sheet.getRow(1); // => null

    //I forgot to write in the first row, so even if I try to create a Row in the first row
    //exception(*)Will occur and the application will be terminated instead of being unable to write.
    sheet.createRow(0);
} catch (IOException e) {
    e.printStackTrace();
}

The exceptions that occur in the above (*) are as follows.

Exception in thread "main" java.lang.IllegalArgumentException: Attempting to write a row[0] in the range [0,999] that is already written to disk.
	at org.apache.poi.xssf.streaming.SXSSFSheet.createRow(SXSSFSheet.java:138)
	at org.apache.poi.xssf.streaming.SXSSFSheet.createRow(SXSSFSheet.java:65)
	at poi.Main.main(Main.java:21)

There is often a use case where you have something like a template file in your system, write data to the template file in batch processing or online processing, and the user uses the result file, but in such a use case, it is designed properly. Otherwise, the SXSSF Workbook will not be available.

Environmental information (excerpt from pom.xml)

<dependencies>
  <!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
  <dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>4.1.2</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
  <dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>4.1.2</version>
  </dependency>
</dependencies>

<properties>
  <maven.compiler.source>11</maven.compiler.source>
  <maven.compiler.target>11</maven.compiler.target>
</properties>

reference

Recommended Posts

Personal notes when using SXSSF Workbook (It is safe to avoid line-by-line access units / You cannot access lines that have already been written in an existing xlsx file)