When working with Excel files programmatically, you often want to get a list of sheet names as a preliminary step to parsing the contents of the sheet. I also encountered such a situation when creating the Excel file comparison tool "Graph paper Diff" introduced in Previous post.
In this article, I will introduce three ways to get a list of sheet names contained in an Excel workbook in Java.
--How to use Apache POI usermodel API (.xls / .xlsx / .xlsm) --How to use the eventmodel API of Apache POI (.xls) --How to use SAX (Simple API for XML) (.xlsx / .xlsm)
Apache POI is a Java library (and its project) that can read and write files in Microsoft Office format and is published by the Apache Software Foundation under the Apache License 2.0. You can use it by adding the library to the build path. (The method is introduced at the end of this article.)
POI provides the usermodel API and the eventmodel API, which will be introduced later. The usermodel API will be your first choice when working with Excel in Java.
[^ 1]: In the code example introduced in this article, processing that is not related to the main line such as input check is omitted due to the paper width.
Implementation example 1: SheetListerWithPOIUsermodelAPI.java
package mypackage;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import org.apache.poi.EncryptedDocumentException;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public class SheetListerWithPOIUsermodelAPI {
/**
*Returns a list of sheet names contained in the specified Excel file.
*
* @param file excel file
* @return List of sheet names
*/
public static List<String> getSheetNames(File file) throws Exception {
try (Workbook book = WorkbookFactory.create(file)) {
return IntStream.range(0, book.getNumberOfSheets())
.mapToObj(book::getSheetAt)
.map(Sheet::getSheetName)
.collect(Collectors.toList());
}
}
public static void main(String[] args) throws Exception {
List<String> sheetNames = getSheetNames(
new File("C:\\Users\\user01\\Desktop\\Game console sales list.xls"));
sheetNames.forEach(System.out::println);
}
}
Execution result
Sales list_2016
Graph_2016
Sales list_2017
Graph_2017
Sales list_2018
Graph_2018
The POI usermodel API must first read and parse the entire Excel file. That is the part of Workbook book = WorkbookFactory.create (file)
. After that, the list of sheet names is obtained by using Workbook # getNumberOfSheets ()
, Workbook # getSheetAt (int)
, Sheet # getSheetName ()
and so on. [^ 2]
The above code example processes .xls format files, but .xlsx / .xlsm format files can also be processed.
[^ 2]: Apache POI API documentation is available here: http://poi.apache.org/apidocs/index.html
Various APIs are provided for transparent processing without being aware of the file format (.xls format or .xlsx / .xlsm format).
Books-sheets-rows-cells, etc. are modeled in the object-oriented way that Java programmers are accustomed to, and can be intuitively understood and handled.
Since the usermodel API first needs to load and parse the entire Excel workbook, even if you just want to get the sheet name, you need the amount of memory and processing time depending on the contents of the entire Excel file. This can be a practical bottleneck.
It's hard to say that the current usermodel API is full of features.
For example, in the above code example, the name of the graph sheet is output in addition to the worksheet. This is difficult if you just want to narrow down to the worksheet name. It would be nice to have an API like Sheet # isChartSheet
, but it doesn't.
The interface / class for handling Excel sheets has the following hierarchical structure.
Sheet: Interface for transparent handling of all types of sheets
├─ XSSFSheet : .xlsx/.xlsm format sheet
│ └─ XSSFChartSheet : .xlsx/.Graph sheet in xlsm format
│
└─ HSSFSheet : .xls format sheet
Therefore, if the Excel file to be processed is in .xlsx / .xlsm format, you can exclude the graph sheet by modifying the source code as follows.
Implementation example 1-Kai
(Omitted)
return IntStream.range(0, book.getNumberOfSheets())
.mapToObj(book::getSheetAt)
.filter(s -> !(s instanceof XSSFChartSheet)) //Add this
.map(Sheet::getSheetName)
.collect(Collectors.toList());
(Omitted)
However, there is no way to exclude .xls format graph sheets [^ 3]. If you try to do something a little elaborate with POI, you will be addicted to these problems one after another.
If these shortcomings are not an issue, the POI usermodel API may be your first choice. In other cases, you will have to resort to other methods.
[^ 3]: Maybe I just don't know. If you know this, please leave a comment.
The eventmodel API is a feature provided for reading Excel files quickly and with a small amount of memory, and writing to files is not supported.
The eventmodel API reads the Excel file in one direction from beginning to end and notifies the application of the contents of the file in the form of an event. The application performs its own processing according to the content of the event.
"Grid Diff" uses this method to read the sheet name list from an Excel workbook in .xls format. Here, we will introduce an implementation example when working with an Excel workbook in .xls format.
Implementation example 2-a:HSSFSheetListerWithPOIEventAPI.java
package mypackage;
import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class HSSFSheetListerWithPOIEventAPI {
private static class HSSFSheetListingListener implements HSSFListener {
private final List<String> sheetNames = new ArrayList<>();
@Override
public void processRecord(Record record) {
if (record.getSid() == BoundSheetRecord.sid) {
BoundSheetRecord bSheetRecord = (BoundSheetRecord) record;
sheetNames.add(bSheetRecord.getSheetname());
}
}
}
/**
*Returns a list of sheet names contained in the specified Excel file.
*
* @param file excel file
* @return List of sheet names
*/
public static List<String> getSheetNames(File file) throws Exception {
try (FileInputStream fis = new FileInputStream(file);
POIFSFileSystem poifs = new POIFSFileSystem(fis)) {
HSSFRequest req = new HSSFRequest();
HSSFSheetListingListener listener = new HSSFSheetListingListener();
req.addListenerForAllRecords(listener);
HSSFEventFactory factory = new HSSFEventFactory();
factory.abortableProcessWorkbookEvents(req, poifs);
return listener.sheetNames;
}
}
public static void main(String[] args) throws Exception {
List<String> sheetNames = getSheetNames(
new File("C:\\Users\\user01\\Desktop\\Game console sales list.xls"));
sheetNames.forEach(System.out::println);
}
}
Execution result
Sales list_2016
Graph_2016
Sales list_2017
Graph_2017
Sales list_2018
Graph_2018
Implementation example 2-b:HSSFSheetListerWithPOIEventAPI.java
package mypackage;
import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.List;
import java.util.Queue;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.record.BOFRecord;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class HSSFSheetListerWithPOIEventAPI {
private static class HSSFSheetListingListener implements HSSFListener {
private final Queue<String> queue = new ArrayDeque<>();
private final List<String> sheetNames = new ArrayList<>();
@Override
public void processRecord(Record record) {
switch (record.getSid()) {
case BoundSheetRecord.sid:
BoundSheetRecord bSheetRecord = (BoundSheetRecord) record;
queue.add(bSheetRecord.getSheetname());
break;
case BOFRecord.sid:
BOFRecord bofRecord = (BOFRecord) record;
switch (bofRecord.getType()) {
case BOFRecord.TYPE_WORKSHEET:
sheetNames.add(queue.remove());
break;
case BOFRecord.TYPE_CHART:
queue.remove();
break;
}
break;
}
}
}
//After that, implementation example 2-Same as a
/**
*Returns a list of sheet names contained in the specified Excel file.
*
* @param file excel file
* @return List of sheet names
*/
public static List<String> getSheetNames(File file) throws Exception {
try (FileInputStream fis = new FileInputStream(file);
POIFSFileSystem poifs = new POIFSFileSystem(fis)) {
HSSFRequest req = new HSSFRequest();
HSSFSheetListingListener listener = new HSSFSheetListingListener();
req.addListenerForAllRecords(listener);
HSSFEventFactory factory = new HSSFEventFactory();
factory.abortableProcessWorkbookEvents(req, poifs);
return listener.sheetNames;
}
}
public static void main(String[] args) throws Exception {
List<String> sheetNames = getSheetNames(
new File("C:\\Users\\user01\\Desktop\\Game console sales list.xls"));
sheetNames.forEach(System.out::println);
}
}
Execution result
Sales list_2016
Sales list_2017
Sales list_2018
As mentioned earlier, POI's eventmodel API receives the contents of the file as an event. You must implement the HSSFListener
interface and override theprocessRecord (Record)
method to receive the event.
Each time you read the file, the processRecord (Record)
method is called, so your application will act according to the content of the event (specifically, the content of the Record
object passed as an argument).
In Implementation Example 2-a, the event is record.getSid () == BoundSheetRecord.sid
, and in Implementation Example 2-b, record.getSid () == BoundSheetRecord.sid
or record.getSid () == Processing an event that is BOFRecord.sid
.
Understanding what events occur and in what order can be a daunting task. I can see the API documentation for the Record Interface (http://poi.apache.org/apidocs/org/apache/poi/hssf/record/Record.html) and its subinterfaces and subclasses, as well as the following code: I gradually deepened my understanding while comparing the actual event contents.
Output and check the actual event content
(Omitted)
@Override
public void processRecord(Record record) {
System.out.println(record);
}
(Omitted)
Note that only .xls format Excel workbooks can be processed by Implementation Examples 2-a and 2-b. It cannot process Excel workbooks in .xlsx / .xlsm format.
Since the eventmodel API is a method of reading from the beginning to the end of a file with a stream, it operates faster with a small amount of memory compared to the usermodel API, which is a method of analyzing the entire file contents and holding it in memory. ..
Since the file contents can be read almost directly, it is possible to realize processing that cannot be realized by the usermodel API.
In order to process with the eventmodel API, it is essential to understand the internal structure of the Excel file, such as what kind of events occur and in what order.
There is a reference book. "Microsoft Excel 97 Developers Kit". Foreign books, out of print. At the time of writing this article, [Used books were listed](https://www.amazon.co.jp/Microsoft-Excel-Developers-Kit-Customize/dp/1572314982/ref=sr_1_1?ie=UTF8&qid = 1530997082 & sr = 8-1 & keywords = 1572314982).
The API documentation for the Record
implementation class has references to this book. For example, in API documentation for the BoundSheetRecord
class, see" REFERENCE: PG 291 Microsoft Excel 97 Developer's Kit. (ISBN: 1-57231-498-2) ". When creating a full-fledged application, you will need to work with a reference book in one hand.
I proceeded with the implementation while checking the API documentation and the actual record contents. Either way, you'll be forced into a trial-and-error implementation.
Since the eventmodel API only reads the file in a straight line from the beginning to the end, if it is necessary to evaluate the contents at the front of the file according to the contents at the back of the file, manage and record the contents of the file by yourself. is needed.
SAX stands for Simple API for XML and is provided as a standard API in Java8 in the ʻorg.xml.sax` package. Similar to POI's eventmodel API, it reads the XML file in one direction from start to finish and notifies the application of the contents of the XML file as an event.
As is well known, the .xlsx / .xlsm format Excel file is actually a ZIP file consisting of several XML files, so you can use SAX to read the contents of an Excel workbook.
"Grid Diff" uses this method to read the sheet name list from an Excel workbook in .xlsx / .xlsm format.
Implementation example 3-a:XSSFSheetListerWithSAX.java
package mypackage;
import java.io.File;
import java.io.InputStream;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class XSSFSheetListerWithSAX {
/**
*As a zip file.xlsx/.Read the following entry from the xlsm file and
*Extract the list of sheet names.
*
* *.xlsx
* +- xl
* +- workbook.xml
*/
private static class Handler1 extends DefaultHandler {
private final List<String> names = new ArrayList<>();
@Override
public void startElement(
String uri,
String localName,
String qName,
Attributes attributes) {
if ("sheet".equals(qName)) {
names.add(attributes.getValue("name"));
}
}
}
/**
*Returns a list of sheet names contained in the specified Excel file.
*
* @param file excel file
* @return List of sheet names
*/
public static List<String> getSheetNames(File file) throws Exception {
try (FileSystem fs = FileSystems.newFileSystem(file.toPath(), null)) {
Handler1 handler1 = new Handler1();
try (InputStream is = Files.newInputStream(
fs.getPath("xl/workbook.xml"))) {
InputSource source = new InputSource(is);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(handler1);
parser.parse(source);
}
return handler1.names;
}
}
public static void main(String[] args) throws Exception {
List<String> sheetNames = getSheetNames(
new File("C:\\Users\\user01\\Desktop\\Game console sales list.xlsx"));
sheetNames.forEach(System.out::println);
}
}
Execution result
Sales list_2016
Graph_2016
Sales list_2017
Graph_2017
Sales list_2018
Graph_2018
Implementation example 3-b:XSSFSheetListerWithSAX.java
package mypackage;
import java.io.File;
import java.io.InputStream;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class XSSFSheetListerWithSAX {
/**
*As a zip file.xlsx/.Read the following entry from the xlsm file and
*Extract the list of sheet names and the map of sheet names and sheet Ids (relId).
*
* *.xlsx
* +- xl
* +- workbook.xml
*/
private static class Handler1 extends DefaultHandler {
private final List<String> names = new ArrayList<>();
private final Map<String, String> nameToId = new HashMap<>();
@Override
public void startElement(
String uri,
String localName,
String qName,
Attributes attributes) {
if ("sheet".equals(qName)) {
names.add(attributes.getValue("name"));
nameToId.put(
attributes.getValue("name"),
attributes.getValue("r:id"));
}
}
}
/**
*As a zip file.xlsx/.Read the following entry from the xlsm file and
*Extract the map of sheet Id (relId) and entry path.
*
* *.xlsx
* +- xl
* +- _rels
* +- workbook.xml.rels
*/
private static class Handler2 extends DefaultHandler {
private final Map<String, String> idToSource = new HashMap<>();
@Override
public void startElement(
String uri,
String localName,
String qName,
Attributes attributes) {
if ("Relationship".equals(qName)) {
idToSource.put(
attributes.getValue("Id"),
attributes.getValue("Target"));
}
}
}
/**
*Returns a list of sheet names contained in the specified Excel file.
*
* @param file excel file
* @return List of sheet names
*/
public static List<String> getSheetNames(File file) throws Exception {
try (FileSystem fs = FileSystems.newFileSystem(file.toPath(), null)) {
Handler1 handler1 = new Handler1();
try (InputStream is = Files.newInputStream(
fs.getPath("xl/workbook.xml"))) {
InputSource source = new InputSource(is);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(handler1);
parser.parse(source);
}
Handler2 handler2 = new Handler2();
try (InputStream is = Files.newInputStream(
fs.getPath("xl/_rels/workbook.xml.rels"))) {
InputSource source = new InputSource(is);
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(handler2);
parser.parse(source);
}
return handler1.names.stream()
.filter(name -> {
String id = handler1.nameToId.get(name);
String source = handler2.idToSource.get(id);
return source.startsWith("worksheets/");
})
.collect(Collectors.toList());
}
}
public static void main(String[] args) throws Exception {
List<String> sheetNames = getSheetNames(
new File("C:\\Users\\user01\\Desktop\\Game console sales list.xlsx"));
sheetNames.forEach(System.out::println);
}
}
Execution result
Sales list_2016
Sales list_2017
Sales list_2018
In order to receive the contents of the XML file as an event, it is convenient to inherit from ʻorg.xml.sax.helpers.DefaultHandler. Of the various callback methods provided by the
DefaultHandler` class, override the callback method that corresponds to the event required by the application.
The above code example uses the startElement (String, String, String, Attributes)
method, which is called every time the start tag of the XML file appears.
In Implementation Example 3-a, the entry xl / workbook.xml
when the Excel file is unzipped as a zip file is added, and in Implementation Example 3-b, the entry
xl / _rels / workbook.xml.rels` is added. Is reading.
The explanation of how to analyze them will be lengthy, so I will omit it. By unzipping the .xlsx / .xlsm format Excel file with the ZIP decompression tool, you can easily check the contents of the XML files that make up the Excel file. Please read the above code example while checking the Excel file you have.
Similar to POI's eventmodel API, it runs fast and saves memory. In addition, it can read only the necessary XML files that make up an Excel file, making it even faster.
Since the file contents can be read directly, it is possible to realize processing that cannot be realized by the usermodel API of POI.
ʻOrg.xml.sax` and its subpackages are provided as standard APIs in Java8. No need to rely on external libraries.
As with POI's eventmodel API, an understanding of the internal structure of Excel files is essential. However, the structure of the .xlsx / .xlsm format Excel workbook is called Office Open XML and is standardized as ISO / IEC 29500. Therefore, it is relatively easy to obtain specification information. Also, since the contents can be easily checked with the ZIP decompression tool, it can be said that it is easier to handle than handling the .xls format Excel workbook with POI's eventmodel API.
Like POI's eventmodel API, SAX only reads an XML file in a straight line from beginning to end, so if you need to evaluate the content at the front of the file according to the content at the end of the file, It is necessary to manage and record the file contents with.
In this article, I introduced the following three methods to get the list of sheet names included in an Excel file in Java.
--How to use Apache POI usermodel API (.xls / .xlsx / .xlsm) --How to use the eventmodel API of Apache POI (.xls) --How to use SAX (Simple API for XML) (.xlsx / .xlsm)
Each method has advantages and disadvantages, and we programmers need to use them properly according to the purpose.
I myself challenged all of the above three methods for the first time when I was faced with the need to create "square Diff". In particular, I felt that the method of using the eventmodel API and SAX was a high hurdle, but surprisingly, it is not so difficult once you start it, and it feels convenient once you get used to it. If you try to overcome the initial resistance and take on the challenge, the range of mounting power may expand.
"Graph paper Diff" also uses multiple methods to update Excel files. I would like to introduce this again in the manuscript.
APPENDIX
You can take advantage of that functionality by adding some of the jar files provided by Apache POI to your build path. Here's how to add a manually downloaded jar file to your build path in Eclipse. (Please hit other ways to use Maven etc.)
From the Download Page of the Apache POI Site to the zip file (poi-bin-3.17 at the time of this writing) -20170915.zip) can be downloaded.
Unzip the zip file anywhere on your local PC. I think it will be decompressed as follows.
poi-3.17
├─ docs
│ └─ Many html files
│
├─ lib
│ ├─ commons-codec-1.10.jar
│ ├─ commons-collections4-4.1.jar
│ ├─ commons-logging-1.2.jar
│ ├─ junit-4.12.jar
│ └─ log4j-1.2.17.jar
│
├─ ooxml-lib
│ ├─ curvesapi-1.04.jar
│ └─ xmlbeans-2.6.0.jar
│
├─ LICENSE
├─ NOTICE
├─ poi-3.17.jar
├─ poi-examples-3.17.jar
├─ poi-excelant-3.17.jar
├─ poi-ooxml-3.17.jar
├─ poi-ooxml-schemas-3.17.jar
└─ poi-scratchpad-3.17.jar
A) Select [Java]> [Build Path]> [User Library]> [New ...] in the Eclipse settings.
B) Enter any name you like, for example "poi-3.17" and press OK.
C) Select the created user library and click [Add External JAR ...].
D) Specify the jar file included in the decompression of 2. above. Also specify what is contained in the lib folder and the ooxml-lib folder.
It includes jar files and sample files for operating Word and PowerPoint as well as Excel, and if you just operate Excel files, you do not need those jar files, but well, put them all in. Shouldn't it be good?
E) Click [Apply and Close].
A) Select the project and select [Build Path]> [Add Library ...] from the context menu.
B) Select User Library as the library type and specify the user library you just created as the target.
that's all.
Recommended Posts