build.gradle
dependencies {
compile group: 'org.apache.poi', name: 'poi', version: '3.15'
compile group: 'org.apache.poi', name: 'poi-ooxml', version: '3.15'
}
I want to read an excel file in Apache POI and get the result of the formula in the cell.
The code is below.
SampleApachePoiUtil.java
public class SampleApachePoiUtil {
/**
*Calculates the formula in the cell and returns the result as a String.
* @param formulaCell Cell containing the formula
* @return Formula result (String))
*/
public static String getStringFormulaValue(Cell formulaCell) {
Workbook book = formulaCell.getSheet().getWorkbook();
CreationHelper helper = book.getCreationHelper();
FormulaEvaluator evaluator = helper.createFormulaEvaluator();
CellValue value = evaluator.evaluate(formulaCell);
//[Caution] Assuming that it is a String
return value.getStringValue();
}
public static void sample(File file) throws IOException, InvalidFormatException {
Workbook workbook = WorkbookFactory.create(file);
Sheet sheet = workbook.getSheet("main");
//Cell containing the formula("main"A1 cell of the sheet)
Cell formulaCell = sheet.getRow(0).getCell(0);
//Calculate formula
String result = getStringFormulaValue(formulaCell);
System.out.println("result = " + result);
workbook.close();
}
}
When I tried to get the result of = Ai! A1
, a formula that refers to the cell of the sheet" Ai ", I got a FormulaParseException
.
Exception in thread "main" org.apache.poi.ss.formula.FormulaParseException: Specified named range 'Ah' does not exist in the current workbook.
at org.apache.poi.ss.formula.FormulaParser.parseNonRange(FormulaParser.java:898)
at org.apache.poi.ss.formula.FormulaParser.parseRangeable(FormulaParser.java:490)
at org.apache.poi.ss.formula.FormulaParser.parseRangeExpression(FormulaParser.java:311)
at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1509)
at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1467)
at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1454)
at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1827)
at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1955)
at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1939)
at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1896)
at org.apache.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1869)
at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1849)
at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:1997)
at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
at org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook.getFormulaTokens(XSSFEvaluationWorkbook.java:85)
at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:315)
at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:259)
at org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:65)
at org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluate(BaseFormulaEvaluator.java:101)
at SampleApachePoiUtil.getStringFormulaValue(SampleApachePoiUtil.java:21)
at SampleApachePoiUtil.sample(SampleApachePoiUtil.java:33)
at Main.main(Main.java:7)
I was told that the cell range named "A" does not exist. Apparently, the sheet name could not be parsed correctly and it was judged as "cell name" for some reason.
I changed the name of the referenced sheet and the extension of the file and checked if an error occurred.
extension | Sheet name referenced by the formula | result |
---|---|---|
xlsx | Ah | NG |
xlsx | Ai | OK |
xlsx | ・ | NG |
xlsx | Oh 〒 | NG |
xls | Ah | OK |
Special characters such as "・" and "〒", not caused by non-ASCII characters? Seems to be the cause.
The following is a stack trace when the sheet name is "・". The message was different from when the sheet name was "Ai".
Exception in thread "main" org.apache.poi.ss.formula.FormulaParseException: Parse error near char 0 '・' in specified formula '・!A1'. Expected cell ref or constant literal
at org.apache.poi.ss.formula.FormulaParser.expected(FormulaParser.java:262)
at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1514)
at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1467)
at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1454)
at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1827)
at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1955)
at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1939)
at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1896)
at org.apache.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1869)
at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1849)
at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:1997)
at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
at org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook.getFormulaTokens(XSSFEvaluationWorkbook.java:85)
at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:315)
at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:259)
at org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:65)
at org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluate(BaseFormulaEvaluator.java:101)
at SampleApachePoiUtil.getStringFormulaValue(SampleApachePoiUtil.java:21)
at SampleApachePoiUtil.sample(SampleApachePoiUtil.java:33)
at Main.main(Main.java:7)
In debugging, I checked the contents of formulaCell._cell
.
<xml-fragment r="A1" s="1" t="str" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xmlns:main="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<main:f>Ah!A1</main:f>
<main:v>hoge</main:v>
</xml-fragment>
The formula was stored in <main: f>
, and the result of the formula was stored in <main: v>
.
The stored values are as expected, so reading the Excel file seems to be fine.
<main: v>
is not the value calculated by Apache POI, but the value stored in the Excel file. The calculation result seems to be cached.The Excel file format (both .xls and .xlsx) stores a "cached" result for every formula along with the formula itself.
https://poi.apache.org/spreadsheet/eval.html Quote
12539
in decimalhttps://www.fileformat.info/info/unicode/char/30fb/index.htm https://ja.wikipedia.org/wiki/%E4%B8%AD%E9%BB%92
When I raised the Apache POI version from 3.15
to 3.16
(strictly 3.16-beta2
), the error disappeared and it worked as expected.
Apache POI 3.16-beta2 change history
Below are related bug reports. Here, it seems that an error occurred when executing the shiftRows
method for the sheet name including" ・ ".
Partial support for unicode sheet names
This is the modified source difference. https://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?r1=1778418&r2=1778417&pathrev=1778418
The ʻisUnquotedSheetNameChar method of
FormulaParser.java` was the direct cause.
The sheet name included in the formula is obtained by the parseSheetName
method of ʻorg.apache.poi.ss.formula.FormulaParser.java`.
java:poi-3.15-source.jar\org.apache.poi.ss.formula.FormulaParser.java
private SheetIdentifier parseSheetName() {
//...
// unquoted sheet names must start with underscore or a letter
if (look =='_' || Character.isLetter(look)) {
StringBuilder sb = new StringBuilder();
// can concatenate idens with dots
while (isUnquotedSheetNameChar(look)) {
sb.append(look);
GetChar();
}
NameIdentifier iden = new NameIdentifier(sb.toString(), false);
SkipWhite();
if (look == '!') {
GetChar();
return new SheetIdentifier(bookName, iden);
}
// See if it's a multi-sheet range, eg Sheet1:Sheet3!A1
if (look == ':') {
return parseSheetRange(bookName, iden);
}
return null;
}
//...
This is the definition of the ʻisUnquotedSheetNameChar method called by the
parseSheetName` method.
java:poi-3.15-source.jar\org.apache.poi.ss.formula.FormulaParser.java
/**
* very similar to {@link SheetNameFormatter#isSpecialChar(char)}
*/
private static boolean isUnquotedSheetNameChar(char ch) {
if(Character.isLetterOrDigit(ch)) {
return true;
}
switch(ch) {
case '.': // dot is OK
case '_': // underscore is OK
return true;
}
return false;
}
The return value of the ʻisUnquotedSheetNameChar` method is as follows.
letter | Unicode Category | return value of isUnquotedSheetNameChar |
---|---|---|
Ah | Letter, Other [Lo] | true |
・ | Punctuation, Other [Po] | false |
I | Letter, Other [Lo] | true |
When passing ・
to the ʻisUnquotedSheetNameChar` method, false is returned, so the sheet name could not be obtained correctly and an error occurred.
java:poi-3.16-beta2-source.jar\org.apache.poi.ss.formula.FormulaParser.java
/**
* very similar to {@link SheetNameFormatter#isSpecialChar(char)}
* @param ch unicode codepoint
*/
private static boolean isUnquotedSheetNameChar(int ch) {
if(Character.isLetterOrDigit(ch)) {
return true;
}
// the sheet naming rules are vague on whether unicode characters are allowed
// assume they're allowed.
if (ch > 128) {
return true;
}
switch(ch) {
case '.': // dot is OK
case '_': // underscore is OK
return true;
}
return false;
}
There are two changes:
"・" Now returns true.
FormulaParseException
--XLSX format
--Refers to a cell in a sheet that contains characters such as "・" in the sheet name.Character.isLetterOrDigit
method returns falsehttps://github.com/apache/poi/blob/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java Quote
This class parses a formula string into a List of tokens in RPN order.
Inspired by
Lets Build a Compiler, by Jack Crenshaw
BNF for the formula expression is :
<expression> ::= <term> [<addop> <term>]*
<term> ::= <factor> [ <mulop> <factor> ]*
<factor> ::= <number> | (<expression>) | <cellRef> | <function>
<function> ::= <functionName> ([expression [, expression]*])
For POI internal use only
Let's Build a Compiler, by Jack Crenshaw
It has nothing to do with Apache POI. The category of "・" is "Punctuation, Other [Po]", but before Unicode 4.1, it was "Punctuation, Connector [Pc]". http://www.unicode.org/reports/tr44/tr44-4.html#Change_History
In this connection, it seems that there was a problem that "In Java 6," ・ "was used in the method name, but it can no longer be used in Java 7." That's horrible. .. ..
https://www.hos.co.jp/blog/20111004/
Using the debugger, I found the following.
parseSheetName
method of FormulaParser
does not passThe JavaDoc of WorkbookEvaluator
says" Keep a cache of calculation results ".
For performance reasons, this class keeps a cache of all previously calculated intermediate cell values.
https://poi.apache.org/apidocs/org/apache/poi/ss/formula/WorkbookEvaluator.html Quote
Didn't you get an error because you are referring to the calculation result cache?
https://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?r1=1778418&r2=1778417&pathrev=1778418
Recommended Posts