When I searched using Elasticsearch in Java, the result was not as expected, so I investigated → responded. I will summarize the contents.
--Index definition --Java source code --Execution query in Kibana (equivalent to Java source code) Are as follows.
In the index definition, in addition to the mappings definition, analyzer is also set as follows. The field name positioning is as follows.
Field name | data type | positioning |
---|---|---|
itemId | integer | Product ID (corresponds to the primary key) |
itemName | text | Product name |
itemNameKana | text | Product name (Katakana) |
itemNameHira | text | Product name (Hiragana) |
Index definition
{
"settings": {
"analysis": {
"filter": {
"my_ngram": {
"type": "ngram",
"min_gram": 1,
"max_gram": 2
}
},
"analyzer": {
"my_kuromoji_analyzer": {
"type": "custom",
"tokenizer": "kuromoji_tokenizer",
"char_filter": [
"icu_normalizer",
"kuromoji_iteration_mark"
],
"filter": [
"kuromoji_stemmer",
"my_ngram"
]
}
}
}
},
"mappings": {
"properties": {
"itemId": {
"type": "integer"
},
"itemName": {
"type": "text",
"analyzer": "my_kuromoji_analyzer"
},
"itemNameKana": {
"type": "text",
"analyzer": "my_kuromoji_analyzer"
},
"itemNameHira": {
"type": "text",
"analyzer": "my_kuromoji_analyzer"
}
}
}
}
As a process on the Java side, the entered search word is searched for matching any of the product name, product name (katakana), and product name (hiragana) fields, and sorted in descending order of score. It is the content.
Java source code
/**
*Product Search
*
* @param keyword search word
* @param index index name
* @param limit number
* @param client Elasticsearch connection client
* @return search results
* @throws IOException
*/
public SearchResponse search(String keyword, String index, int limit, RestHighLevelClient client) throws IOException{
//Initialize search conditions
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
boolQueryBuilder.should(QueryBuilders.matchQuery("itemName", keyword))
.should(QueryBuilders.matchQuery("itemNameKana", keyword))
.should(QueryBuilders.matchQuery("itemNameHira", keyword));
searchSourceBuilder.query(boolQueryBuilder);
//Sort order setting (sort by score)
searchSourceBuilder.sort(new FieldSortBuilder("_score").order(SortOrder.DESC));
//Set the number of returns
searchBuilder.size(limit);
SearchRequest request = new SearchRequest(index).source(searchSourceBuilder);
return client.search(request, RequestOptions.DEFAULT);
}
Finally, execute query in Kibana. Enter the word you want to search in the place where "Search word" is entered and search. Also, specify 5 for size (limit in Java source code).
Execution query in Kibana
POST item_list/_search
{
"from": 0,
"size": 5,
"sort": {
"_score": {
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"itemName": "Enter a search word"
}
},
{
"match": {
"itemNameKana": "Enter a search word"
}
},
{
"match": {
"itemNameHira": "Enter a search word"
}
}
]
}
}
}
The following are considered as candidate factors.
―― 1. The content of the query issued in Java is incorrect. ―― 2. The analyzer settings are incorrect. --3. It is necessary to correct the implicitly set contents.
I decided to investigate these in order.
First, from the perspective of whether the content of the query issued in Java is incorrect. As a confirmation method, the Java source code SearchRequest request = new SearchRequest(index).source(searchBuilder); A breakpoint is set in the sentence to be done in 1 and the contents of searchSourceBuilder are viewed.
The contents were as follows.
Contents of searchBuilder
{"size"5,"query":{"bool":{"should":[{"match":{"itemName":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameKana":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameHira":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}}]}
At first glance, there seems to be no problem. Running this content in Kibana gave exactly the same result as hitting "Run Query in Kibana". The executed query is as follows.
Query executed with the contents of searchSourceBuilder
POST item_list/_search
{"size"5,"query":{"bool":{"should":[{"match":{"itemName":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameKana":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameHira":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}}]}
From this result, it can be said that "the content of the query issued in Java is incorrect" is not the cause. Also, when I checked the score (_score) at this stage for the acquisition result in Java (result returned to Elasticsearch → Java: return value of client.search (request, RequestOptions.DEFAULT)), Kibana It turned out that it is different from the execution result in. </ b> (This is an important point that was revealed in the survey, so I will leave it in bold)
Next, from the perspective of whether the analyzer settings are incorrect, if the analyzer settings are incorrect, you should get strange results when you execute a query in Kibana. This time it wasn't, so I knew that "analyzer settings were wrong" wasn't the cause either.
This is the rest. Since there was no problem with the contents of searchSourceBuilder, it is highly possible that
Any of.
When I checked the contents of request (SearchRequest) in debug mode,
Contents of request (SearchRequest)
SearchRequest{searchType=QUERY_THEN_FETCH, indices=[item_list], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":10,"query":{"bool":{"should":[{"match":{"itemName":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameKana":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"match":{"itemNameHira":{"query":"Enter a search word","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"_score":{"order":"desc"}}]}}
It was. After source, I confirmed it by checking the contents of searchSourceBuilder, so I excluded it.
When it comes to, if you look at the major items,
is not it.
For searchType,
There is a description in.
For indicesOptions,
I don't think it's a little different, but
Is not it.
Of these, the searchType was clearly stated regarding the score (_score). (As for indicesOptions, I couldn't find anything about score (_score) as far as I looked around.)
The page mentioned above, Search Type (Elasticsearch Reference [6.8]) | elastic In the "Dfs, Query Then Fetch" section of
・ ・ ・ More accurate scoring.
Because there is, I decided to set this which seems to give a more accurate score. (Following "Correspondence")
From the survey results, we found that it seems necessary to correct the contents of SearchType, so we will respond. Set the searchType of SearchRequest to "Dfs, Query Then Fetch" (where "Additional" is written).
Java source code (after modification)
/**
*Product Search
*
* @param keyword keyword
* @param index index name
* @param limit number
* @param client Elasticsearch connection client
* @return search results
* @throws IOException
*/
public SearchResponse search(String keyword, String index, int limit, RestHighLevelClient client) throws IOException{
//Initialize search conditions
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
boolQueryBuilder.should(QueryBuilders.matchQuery("itemName", keyword))
.should(QueryBuilders.matchQuery("itemNameKana", keyword))
.should(QueryBuilders.matchQuery("itemNameHira", keyword));
searchSourceBuilder.query(boolQueryBuilder);
//Sort order setting (sort by score)
searchSourceBuilder.sort(new FieldSortBuilder("_score").order(SortOrder.DESC));
//Set the number of returns
searchSourceBuilder.size(limit);
SearchRequest request = new SearchRequest(index).source(searchSourceBuilder);
//Search type DFS,Set to Query Then Fetch//Postscript
request.searchType(SearchType.DFS_QUERY_THEN_FETCH); //Postscript
return client.search(request, RequestOptions.DEFAULT);
}
Then, when I tried to move it, the result was as expected (same as the execution query in Kibana)! The problem is solved safely! !!
The correct answer this time was to set the searchType of SearchRequest to "Dfs, Query Then Fetch". At first I was impatient because I couldn't get the expected result, but I'm glad I was able to solve it.
Although it does not appear in the text, it is a list of articles that I have referred to.
-Summary of settings for using Elasticsearch in Japanese | Qiita -[Elasticsearch] Try to improve search accuracy with Kuromoji and ngram | Qiita
Recommended Posts