[Elasticsearch × Java] Since the execution result of the query acquired in Java was different from the assumption, I investigated → Summary of the corresponding contents

When I searched using Elasticsearch in Java, the result was not as expected, so I investigated → responded. I will summarize the contents.

environment

Investigation

Source

--Index definition --Java source code --Execution query in Kibana (equivalent to Java source code) Are as follows.

In the index definition, in addition to the mappings definition, analyzer is also set as follows. The field name positioning is as follows.

Field name data type positioning
itemId integer Product ID (corresponds to the primary key)
itemName text Product name
itemNameKana text Product name (Katakana)
itemNameHira text Product name (Hiragana)

Index definition


{
  "settings": {
    "analysis": {
      "filter": {
        "my_ngram": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 2
        }
      },
      "analyzer": {
        "my_kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji_tokenizer",
          "char_filter": [
            "icu_normalizer",
            "kuromoji_iteration_mark"
          ],
          "filter": [
            "kuromoji_stemmer",
            "my_ngram"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "itemId": {
        "type": "integer"
      },
      "itemName": {
        "type": "text",
        "analyzer": "my_kuromoji_analyzer"
      },
      "itemNameKana": {
        "type": "text",
        "analyzer": "my_kuromoji_analyzer"
      },
      "itemNameHira": {
        "type": "text",
        "analyzer": "my_kuromoji_analyzer"
      }
    }
  }
}

As a process on the Java side, the entered search word is searched for matching any of the product name, product name (katakana), and product name (hiragana) fields, and sorted in descending order of score. It is the content.

Java source code


	/**
	 *Product Search
	 *
	 * @param keyword search word
	 * @param index index name
	 * @param limit number
	 * @param client Elasticsearch connection client
	 * @return search results
	 * @throws IOException
	 */
	public SearchResponse search(String keyword, String index, int limit, RestHighLevelClient client) throws IOException{

		//Initialize search conditions
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

		BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();

		boolQueryBuilder.should(QueryBuilders.matchQuery("itemName", keyword))
						.should(QueryBuilders.matchQuery("itemNameKana", keyword))
						.should(QueryBuilders.matchQuery("itemNameHira", keyword));

		searchSourceBuilder.query(boolQueryBuilder);
		//Sort order setting (sort by score)
		searchSourceBuilder.sort(new FieldSortBuilder("_score").order(SortOrder.DESC));
		//Set the number of returns
		searchBuilder.size(limit);

		SearchRequest request = new SearchRequest(index).source(searchSourceBuilder);

		return client.search(request, RequestOptions.DEFAULT);


	}

Finally, execute query in Kibana. Enter the word you want to search in the place where "Search word" is entered and search. Also, specify 5 for size (limit in Java source code).

Execution query in Kibana


POST item_list/_search
{
  "from": 0,
  "size": 5,
  "sort": {
    "_score": {
      "order": "desc"
    }
  },
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "itemName": "Enter a search word"
           }
        },
        {
          "match": {
            "itemNameKana": "Enter a search word"
          }
        },
        {
          "match": {
            "itemNameHira": "Enter a search word"
          }
        }
      ]
    }
  }
}

Candidate factors

The following are considered as candidate factors.

―― 1. The content of the query issued in Java is incorrect. ―― 2. The analyzer settings are incorrect. --3. It is necessary to correct the implicitly set contents.

I decided to investigate these in order.

Survey implementation

1. The content of the query issued in Java is incorrect

First, from the perspective of whether the content of the query issued in Java is incorrect. As a confirmation method, the Java source code SearchRequest request = new SearchRequest(index).source(searchBuilder); A breakpoint is set in the sentence to be done in 1 and the contents of searchSourceBuilder are viewed.

The contents were as follows.

Contents of searchBuilder


{"size"5,"query":{"bool":{"should":[{"match":{"itemName":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameKana":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameHira":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}}]}

At first glance, there seems to be no problem. Running this content in Kibana gave exactly the same result as hitting "Run Query in Kibana". The executed query is as follows.

Query executed with the contents of searchSourceBuilder


POST item_list/_search
{"size"5,"query":{"bool":{"should":[{"match":{"itemName":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameKana":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameHira":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}}]}

From this result, it can be said that "the content of the query issued in Java is incorrect" is not the cause. Also, when I checked the score (_score) at this stage for the acquisition result in Java (result returned to Elasticsearch → Java: return value of client.search (request, RequestOptions.DEFAULT)), Kibana It turned out that it is different from the execution result in. </ b> (This is an important point that was revealed in the survey, so I will leave it in bold)

2. The analyzer settings are incorrect

Next, from the perspective of whether the analyzer settings are incorrect, if the analyzer settings are incorrect, you should get strange results when you execute a query in Kibana. This time it wasn't, so I knew that "analyzer settings were wrong" wasn't the cause either.

3. It is necessary to correct the implicitly set contents.

This is the rest. Since there was no problem with the contents of searchSourceBuilder, it is highly possible that

  • request(SearchRequest)
  • client.search(request, RequestOptions.DEFAULT)(SearchResponse)

Any of.

When I checked the contents of request (SearchRequest) in debug mode,

Contents of request (SearchRequest)


SearchRequest{searchType=QUERY_THEN_FETCH, indices=[item_list], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":10,"query":{"bool":{"should":[{"match":{"itemName":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameKana":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameHira":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}}]}}

It was. After source, I confirmed it by checking the contents of searchSourceBuilder, so I excluded it.

When it comes to, if you look at the major items,

  • searchType
  • indicesOptions

is not it.

For searchType,

There is a description in.

For indicesOptions,

I don't think it's a little different, but

Is not it.

Of these, the searchType was clearly stated regarding the score (_score). (As for indicesOptions, I couldn't find anything about score (_score) as far as I looked around.)

The page mentioned above, Search Type (Elasticsearch Reference [6.8]) | elastic In the "Dfs, Query Then Fetch" section of

・ ・ ・ More accurate scoring.

Because there is, I decided to set this which seems to give a more accurate score. (Following "Correspondence")

Correspondence contents

From the survey results, we found that it seems necessary to correct the contents of SearchType, so we will respond. Set the searchType of SearchRequest to "Dfs, Query Then Fetch" (where "Additional" is written).

Java source code (after modification)


	/**
	 *Product Search
	 *
	 * @param keyword keyword
	 * @param index index name
	 * @param limit number
	 * @param client Elasticsearch connection client
	 * @return search results
	 * @throws IOException
	 */
	public SearchResponse search(String keyword, String index, int limit, RestHighLevelClient client) throws IOException{

		//Initialize search conditions
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

		BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();

		boolQueryBuilder.should(QueryBuilders.matchQuery("itemName", keyword))
						.should(QueryBuilders.matchQuery("itemNameKana", keyword))
						.should(QueryBuilders.matchQuery("itemNameHira", keyword));

		searchSourceBuilder.query(boolQueryBuilder);
		//Sort order setting (sort by score)
		searchSourceBuilder.sort(new FieldSortBuilder("_score").order(SortOrder.DESC));
		//Set the number of returns
		searchSourceBuilder.size(limit);

		SearchRequest request = new SearchRequest(index).source(searchSourceBuilder);
		//Search type DFS,Set to Query Then Fetch//Postscript
		request.searchType(SearchType.DFS_QUERY_THEN_FETCH); //Postscript
		return client.search(request, RequestOptions.DEFAULT);


	}

Then, when I tried to move it, the result was as expected (same as the execution query in Kibana)! The problem is solved safely! !!

Conclusion

The correct answer this time was to set the searchType of SearchRequest to "Dfs, Query Then Fetch". At first I was impatient because I couldn't get the expected result, but I'm glad I was able to solve it.

reference

Although it does not appear in the text, it is a list of articles that I have referred to.

-Summary of settings for using Elasticsearch in Japanese | Qiita -[Elasticsearch] Try to improve search accuracy with Kuromoji and ngram | Qiita

Recommended Posts

[Elasticsearch × Java] Since the execution result of the query acquired in Java was different from the assumption, I investigated → Summary of the corresponding contents
Get the result of POST in Java
Summary of points I was worried about when migrating from java to kotlin
The part I was addicted to in "Introduction to Ajax in Java Web Applications" of NetBeans
Return the execution result of Service class in ServiceResponse class
I want to recreate the contents of assets from scratch in the environment built with capistrano
Was done in the base year of the Java calendar week
When I was worried about static methods in java interface, I arrived in the order of name interpretation