[JAVA] In Apache POI 3.15, when I get the result of the formula, FormulaParseException occurs (the formula refers to "cell of sheet name including" ・ ")

environment

build.gradle


dependencies {
    compile group: 'org.apache.poi', name: 'poi', version: '3.15'
    compile group: 'org.apache.poi', name: 'poi-ooxml', version: '3.15'
}

Thing you want to do

I want to read an excel file in Apache POI and get the result of the formula in the cell.

The code is below.

SampleApachePoiUtil.java


public class SampleApachePoiUtil {

    /**
     *Calculates the formula in the cell and returns the result as a String.
     * @param formulaCell Cell containing the formula
     * @return Formula result (String))
     */
    public static String getStringFormulaValue(Cell formulaCell) {
        Workbook book = formulaCell.getSheet().getWorkbook();
        CreationHelper helper = book.getCreationHelper();
        FormulaEvaluator evaluator = helper.createFormulaEvaluator();
        CellValue value = evaluator.evaluate(formulaCell);
        //[Caution] Assuming that it is a String
        return value.getStringValue();
    }


    public static void sample(File file) throws IOException, InvalidFormatException {
        Workbook workbook = WorkbookFactory.create(file);
        Sheet sheet = workbook.getSheet("main");
        //Cell containing the formula("main"A1 cell of the sheet)
        Cell formulaCell = sheet.getRow(0).getCell(0);
        //Calculate formula
        String result = getStringFormulaValue(formulaCell);
        System.out.println("result = " + result);
        workbook.close();
    }
}

Reference site

problem

When I tried to get the result of = Ai! A1, a formula that refers to the cell of the sheet" Ai ", I got a FormulaParseException.

image

Exception in thread "main" org.apache.poi.ss.formula.FormulaParseException: Specified named range 'Ah' does not exist in the current workbook.
	at org.apache.poi.ss.formula.FormulaParser.parseNonRange(FormulaParser.java:898)
	at org.apache.poi.ss.formula.FormulaParser.parseRangeable(FormulaParser.java:490)
	at org.apache.poi.ss.formula.FormulaParser.parseRangeExpression(FormulaParser.java:311)
	at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1509)
	at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1467)
	at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1454)
	at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1827)
	at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1955)
	at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1939)
	at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1896)
	at org.apache.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1869)
	at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1849)
	at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:1997)
	at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
	at org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook.getFormulaTokens(XSSFEvaluationWorkbook.java:85)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:315)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:259)
	at org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:65)
	at org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluate(BaseFormulaEvaluator.java:101)
	at SampleApachePoiUtil.getStringFormulaValue(SampleApachePoiUtil.java:21)
	at SampleApachePoiUtil.sample(SampleApachePoiUtil.java:33)
	at Main.main(Main.java:7)

I was told that the cell range named "A" does not exist. Apparently, the sheet name could not be parsed correctly and it was judged as "cell name" for some reason.

Research of cause

Patterns that cause problems

I changed the name of the referenced sheet and the extension of the file and checked if an error occurred.

extension Sheet name referenced by the formula result
xlsx Ah NG
xlsx Ai OK
xlsx NG
xlsx Oh 〒 NG
xls Ah OK

Special characters such as "・" and "〒", not caused by non-ASCII characters? Seems to be the cause.

The following is a stack trace when the sheet name is "・". The message was different from when the sheet name was "Ai".

Exception in thread "main" org.apache.poi.ss.formula.FormulaParseException: Parse error near char 0 '・' in specified formula '・!A1'. Expected cell ref or constant literal
	at org.apache.poi.ss.formula.FormulaParser.expected(FormulaParser.java:262)
	at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1514)
	at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1467)
	at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1454)
	at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1827)
	at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1955)
	at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1939)
	at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1896)
	at org.apache.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1869)
	at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1849)
	at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:1997)
	at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
	at org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook.getFormulaTokens(XSSFEvaluationWorkbook.java:85)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:315)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:259)
	at org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:65)
	at org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluate(BaseFormulaEvaluator.java:101)
	at SampleApachePoiUtil.getStringFormulaValue(SampleApachePoiUtil.java:21)
	at SampleApachePoiUtil.sample(SampleApachePoiUtil.java:33)
	at Main.main(Main.java:7)

Check the formula cell with the debugger

In debugging, I checked the contents of formulaCell._cell.

<xml-fragment r="A1" s="1" t="str" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xmlns:main="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <main:f>Ah!A1</main:f>
  <main:v>hoge</main:v>
</xml-fragment>

The formula was stored in <main: f>, and the result of the formula was stored in <main: v>. The stored values are as expected, so reading the Excel file seems to be fine.

image

The Excel file format (both .xls and .xlsx) stores a "cached" result for every formula along with the formula itself.

https://poi.apache.org/spreadsheet/eval.html Quote

"·"about

https://www.fileformat.info/info/unicode/char/30fb/index.htm https://ja.wikipedia.org/wiki/%E4%B8%AD%E9%BB%92

Solution

When I raised the Apache POI version from 3.15 to 3.16 (strictly 3.16-beta2), the error disappeared and it worked as expected.

Apache POI 3.16-beta2 change history

Below are related bug reports. Here, it seems that an error occurred when executing the shiftRows method for the sheet name including" ・ ". Partial support for unicode sheet names

This is the modified source difference. https://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?r1=1778418&r2=1778417&pathrev=1778418

Cause of error

The ʻisUnquotedSheetNameChar method of FormulaParser.java` was the direct cause.

The sheet name included in the formula is obtained by the parseSheetName method of ʻorg.apache.poi.ss.formula.FormulaParser.java`.

java:poi-3.15-source.jar\org.apache.poi.ss.formula.FormulaParser.java


private SheetIdentifier parseSheetName() {
//...
        // unquoted sheet names must start with underscore or a letter
        if (look =='_' || Character.isLetter(look)) {
            StringBuilder sb = new StringBuilder();
            // can concatenate idens with dots
            while (isUnquotedSheetNameChar(look)) {
                sb.append(look);
                GetChar();
            }
            NameIdentifier iden = new NameIdentifier(sb.toString(), false);
            SkipWhite();
            if (look == '!') {
                GetChar();
                return new SheetIdentifier(bookName, iden);
            }
            // See if it's a multi-sheet range, eg Sheet1:Sheet3!A1
            if (look == ':') {
                return parseSheetRange(bookName, iden);
            }
            return null;
        }
//...

This is the definition of the ʻisUnquotedSheetNameChar method called by the parseSheetName` method.

java:poi-3.15-source.jar\org.apache.poi.ss.formula.FormulaParser.java


    /**
     * very similar to {@link SheetNameFormatter#isSpecialChar(char)}
     */
    private static boolean isUnquotedSheetNameChar(char ch) {
        if(Character.isLetterOrDigit(ch)) {
            return true;
        }
        switch(ch) {
            case '.': // dot is OK
            case '_': // underscore is OK
                return true;
        }
        return false;
    }

The return value of the ʻisUnquotedSheetNameChar` method is as follows.

letter Unicode Category return value of isUnquotedSheetNameChar
Ah Letter, Other [Lo] true
Punctuation, Other [Po] false
I Letter, Other [Lo] true

When passing to the ʻisUnquotedSheetNameChar` method, false is returned, so the sheet name could not be obtained correctly and an error occurred.

ʻIsUnquotedSheetNameChar` method modified version (Apache POI 3.16 beta2)

java:poi-3.16-beta2-source.jar\org.apache.poi.ss.formula.FormulaParser.java


    /**
     * very similar to {@link SheetNameFormatter#isSpecialChar(char)}
     * @param ch unicode codepoint
     */
    private static boolean isUnquotedSheetNameChar(int ch) {
        if(Character.isLetterOrDigit(ch)) {
            return true;
        }
        // the sheet naming rules are vague on whether unicode characters are allowed
        // assume they're allowed.
        if (ch > 128) {
            return true;
        }
        switch(ch) {
            case '.': // dot is OK
            case '_': // underscore is OK
                return true;
        }
        return false;
    }

There are two changes:

"・" Now returns true.

Summary

Supplement

Excel formula BNE

https://github.com/apache/poi/blob/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java Quote

 This class parses a formula string into a List of tokens in RPN order.
 Inspired by
           Lets Build a Compiler, by Jack Crenshaw
 BNF for the formula expression is :
 <expression> ::= <term> [<addop> <term>]*
 <term> ::= <factor>  [ <mulop> <factor> ]*
 <factor> ::= <number> | (<expression>) | <cellRef> | <function>
 <function> ::= <functionName> ([expression [, expression]*])
 
 For POI internal use only

Unicode category of "・"

It has nothing to do with Apache POI. The category of "・" is "Punctuation, Other [Po]", but before Unicode 4.1, it was "Punctuation, Connector [Pc]". http://www.unicode.org/reports/tr44/tr44-4.html#Change_History

In this connection, it seems that there was a problem that "In Java 6," ・ "was used in the method name, but it can no longer be used in Java 7." That's horrible. .. ..

https://www.hos.co.jp/blog/20111004/

What I didn't understand

The reason why the error did not occur in the XLS format (Excel 2003 format)

Using the debugger, I found the following.

The JavaDoc of WorkbookEvaluator says" Keep a cache of calculation results ".

For performance reasons, this class keeps a cache of all previously calculated intermediate cell values.

https://poi.apache.org/apidocs/org/apache/poi/ss/formula/WorkbookEvaluator.html Quote

Didn't you get an error because you are referring to the calculation result cache?

https://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?r1=1778418&r2=1778417&pathrev=1778418

Recommended Posts

In Apache POI 3.15, when I get the result of the formula, FormulaParseException occurs (the formula refers to "cell of sheet name including" ・ ")
I want to get the value of Cell transparently regardless of CellType (Apache POI)
How to get the class name of the argument of LoggerFactory.getLogger when using SLF4J in Java
Get the result of POST in Java
I want to get the IP address when connecting to Wi-Fi in Java
I want to get the field name of the [Java] field. (Old tale tone)
How to get Excel sheet name list in Java (POI vs SAX)
What I tried when I wanted to get all the fields of a bean
I managed to get a blank when I brought the contents of Beans to the textarea
When you get lost in the class name
I want to get the value in Ruby
I want to find the MD5 checksum of a file in Java and get the result as a string in hexadecimal notation.
Get the name of the test case in the JUnit test class
I want to display the name of the poster of the comment
[Ruby on Rails] I want to get the URL of the image saved in Active Storage
When I switched to IntelliJ, I got a lot of differences in the encoding of the properties file.
How to get the class name / method name running in Java
When I was worried about static methods in java interface, I arrived in the order of name interpretation
Fix the file name of war to the one set in Maven
How to get the id of PRIMAY KEY auto_incremented in MyBatis
I want to change the value of Attribute in Selenium of Ruby
[Android] I want to get the listener from the button in ListView
How to get the length of an audio file in java