What about full-text search? So I tried using Groonga. → Groonga
"Full-text search" is like "Google search" in a very simple way. Enter keywords and find the one you want from a lot of documents.
So, first of all, if you ask Google about "full-text search system", you will find various things.
etc While thinking that I miss Namazu, I looked around and found that Groonga was good, so I decided to use it.
It was a bit of a local environment, so I decided to install it using docker-compose. # It's also written in the official documentation. .. ..
Oh, it certainly got up easily.
To confirm, launch the browser and enter http: // localhost: 10041
, and a page will be displayed.
% docker-compose run groonga
If so, it will fall unless the terminal is left up.
Then, when I try to start it again and say % docker-compose restart
, this time, the data file already exists. Error. .. ..
-n
in command: ["-n", "/mnt/db/data.db"]
is an option to create a new data file, so every time I try to create a new one, I get angry. I did.Mumumu.
Please refer to the source on Github.
I will explain briefly. First, under / groonga.
Configuration settings etc. are described in docker-compose.yml
.
I am trying to generate it from ./groonga/Dockerfile with build: ./groonga
in it.
Use volumes
to make the data files visible even from the host machine.
The rest of xxxx.sh feels like a useful tool.
When installing, it will do docker-compose build with $ ./install.sh
.
If you want to enter the docker process, enter it with $ ./login.sh
.
In this case, use $ exit
to exit.
The stop is $ ./stop.sh
and the restart is $ ./restart.sh
.
When you no longer need the docker process, do a docker rm container
with $ ./remove.sh
.
Next, under / groonga / groonga. The Dockerfile has specific instructions on how to make it.
If you install groonga normally and access it via http, you may get an error from time to time. When I looked it up on the net, I found the following article. → https://okamuuu.hatenablog.com/entry/2017/11/13/185903
So stack-fix.c
is a patch. I wrote the one including it in the Dockerfile.
Then, in order to avoid the "error when creating a new file when restarting" earlier, I wrote a startup script, created a data file if it did not exist, and used it if it existed. It is groonga.sh
.
So, $ ./install.sh
.
Go to http: // localhost: 10041
and yes. It's done.
There seem to be several ways to use groonga from ruby. rroonga is famous, but it looks like a library when groonga is running on the same server. It looks like groonga-client when running on Docker or another server (including virtual).
So install.
groonga-client is $ gem install groonga-client
You can install it with.
I ran the following sample.
test.rb
# -*- coding: utf-8 -*-
require "groonga/client"
host = "127.0.0.1"
port = 10041
Groonga::Client.open(host: host, port: port, protocol: :http) do |client|
tables = client.table_list
unless tables.map{|m| m.name}.include?("docs")
# ---- create normal table ----
client.table_create(name: "docs",
flags: "TABLE_HASH_KEY",
key_type: "ShortText")
client.column_create(table: "docs",
name: "body",
flags: "COLUMN_SCALAR",
type: "Text")
# ---- data insert to table ----
values = [
{ "_key" => "/path/to/document/1",
"body" => "Meros was furious." },
{ "_key" => "/path/to/document/2",
"body" => "Meros doesn't understand politics." },
{ "_key" => "/path/to/document/3",
"body" => "Meros had a friend of stilts." },
]
client.load(table: "docs",
values: values.to_json)
end
# ---- data search ----
query = "loose the temper"
response = client.select(table: "docs",
query: query,
match_columns: "body")
puts "hits: #{response.n_hits} (query: #{query} -> body)"
response.records.each do |record|
p record
end
query = "Politics"
response = client.select(table: "docs",
query: "body:@#{query}")
puts "hits: #{response.n_hits} (query: #{query} -> body)"
response.records.each do |record|
p record
end
filter = "/path/to/document/3"
response = client.select(table: "docs",
filter: "_key == '#{filter}'")
puts "hits: #{response.n_hits} (filter: #{filter} -> _key)"
response.records.each do |record|
p record
end
query = "/document"
response = client.select(table: "docs",
query: "_key:@#{query}")
puts "hits: #{response.n_hits} (query: #{query} -> _key)"
response.records.each do |record|
p record
end
end
Then execute.
$ ruby ./test.rb
hits: 1 (query:loose the temper-> body)
{"_id"=>1, "_key"=>"/path/to/document/1", "body"=>"Meros was furious."}
hits: 1 (query:Politics-> body)
{"_id"=>2, "_key"=>"/path/to/document/2", "body"=>"Meros doesn't understand politics."}
hits: 1 (filter: /path/to/document/3 -> _key)
{"_id"=>3, "_key"=>"/path/to/document/3", "body"=>"Meros had a friend of stilts."}
hits: 3 (query: /document -> _key)
{"_id"=>1, "_key"=>"/path/to/document/1", "body"=>"Meros was furious."}
{"_id"=>2, "_key"=>"/path/to/document/2", "body"=>"Meros doesn't understand politics."}
{"_id"=>3, "_key"=>"/path/to/document/3", "body"=>"Meros had a friend of stilts."}
It looks good.
I tried to write a query in two ways by searching for'rage'and'politics'.
Using match_columns
and writing like column: @ query
.
For the time being, which one is fine?
.. .. .. I mean, when I created the table, I didn't set a tokenizer or something, and I didn't create an index table, but I can search the body column as "% rage%". .. .. Is that what a query is? .. ..
I mean, it looks like that. Yes.
So, I created an index table. It seems that the search time will be very fast if you make this.
test2.rb
# -*- coding: utf-8 -*-
require "groonga/client"
host = "127.0.0.1"
port = 10041
Groonga::Client.open(host: host, port: port, protocol: :http) do |client|
tables = client.table_list
unless tables.map{|m| m.name}.include?("doc_indexes")
# ---- create indexes ----
client.table_create(name: "doc_indexes",
flags: "TABLE_PAT_KEY",
key_type: "ShortText",
default_tokenizer: "TokenBigram",
normalizer: "NormalizerAuto")
client.column_create(table: "doc_indexes",
name: "body_index",
flags: "COLUMN_INDEX|WITH_POSITION",
type: "docs",
source: "body")
end
query = "I don't know"
response = client.select(table: "docs",
query: query,
match_columns: "doc_indexes.body_index")
puts "hits: #{response.n_hits} (query: #{query} -> doc_indexes.body_index)"
response.records.each do |record|
p record
end
end
When executed with
$ ruby ./test2.rb
hits: 1 (query:I don't know-> doc_indexes.body_index)
{"_id"=>2, "_key"=>"/path/to/document/2", "body"=>"Meros doesn't understand politics."}
I got it. Perhaps this is the right way to go. .. ..
Umm.
Oh yeah, I forgot the important thing. groonga is a database system optimized for full-text search. So it's a little different from the RDB and KVS I use all the time, so it's hard to get used to it.
In RDB, you can create a column to be used as a key when creating a table, a column to store data, etc., but groonga has a key structure and full-text search function when creating a table for the first time. Is specified, and columns are added to the table later. Also, how to create an index and how to search is a unique impression. I think it will be convenient if you can master it.
that's all.
Recommended Posts