[RUBY] It's new, but I tried using Groonga

What about full-text search? So I tried using Groonga.  → Groonga

Full text search

"Full-text search" is like "Google search" in a very simple way. Enter keywords and find the one you want from a lot of documents.

There seems to be various

So, first of all, if you ask Google about "full-text search system", you will find various things.

etc While thinking that I miss Namazu, I looked around and found that Groonga was good, so I decided to use it.

Installation

It was a bit of a local environment, so I decided to install it using docker-compose. # It's also written in the official documentation. .. ..

I tried it the way it is officially written

Groonga#with-docker-compose

Oh, it certainly got up easily. To confirm, launch the browser and enter http: // localhost: 10041, and a page will be displayed.

I tried to upload it with the intention of pasting an image, but no matter how many times I tried it, I got angry as follows, so please complement it in your brain (imagine it). .. ..

Something went wrong

But. .. ..

% docker-compose run groonga If so, it will fall unless the terminal is left up. Then, when I try to start it again and say % docker-compose restart, this time, the data file already exists. Error. .. ..

The reason is that -n in command: ["-n", "/mnt/db/data.db"] is an option to create a new data file, so every time I try to create a new one, I get angry. I did.

Mumumu.

So, I made a Dockerfile and tried to launch it

Please refer to the source on Github.

I will explain briefly. First, under / groonga.

Configuration settings etc. are described in docker-compose.yml. I am trying to generate it from ./groonga/Dockerfile with build: ./groonga in it. Use volumes to make the data files visible even from the host machine.

The rest of xxxx.sh feels like a useful tool. When installing, it will do docker-compose build with $ ./install.sh. If you want to enter the docker process, enter it with $ ./login.sh. In this case, use $ exit to exit. The stop is $ ./stop.sh and the restart is $ ./restart.sh. When you no longer need the docker process, do a docker rm container with $ ./remove.sh.

Next, under / groonga / groonga. The Dockerfile has specific instructions on how to make it.

There seems to be something buggy

If you install groonga normally and access it via http, you may get an error from time to time. When I looked it up on the net, I found the following article.  → https://okamuuu.hatenablog.com/entry/2017/11/13/185903

Okamuuu Thank you.

But this article was in 2017 and still has the same error in 2020. .. ..

So stack-fix.c is a patch. I wrote the one including it in the Dockerfile. Then, in order to avoid the "error when creating a new file when restarting" earlier, I wrote a startup script, created a data file if it did not exist, and used it if it existed. It is groonga.sh.

Install again

So, $ ./install.sh.

Go to http: // localhost: 10041 and yes. It's done.

Let's start with Ruby.

There seem to be several ways to use groonga from ruby. rroonga is famous, but it looks like a library when groonga is running on the same server. It looks like groonga-client when running on Docker or another server (including virtual).

So install.

Installation or sample

groonga-client is $ gem install groonga-client You can install it with.

I ran the following sample.

test.rb


# -*- coding: utf-8 -*-
require "groonga/client"

host = "127.0.0.1"
port = 10041
Groonga::Client.open(host: host, port: port, protocol: :http) do |client|
  tables = client.table_list
  unless tables.map{|m| m.name}.include?("docs")
    # ---- create normal table ----
    client.table_create(name: "docs",
                        flags: "TABLE_HASH_KEY",
                        key_type: "ShortText")
    client.column_create(table: "docs",
                         name: "body",
                         flags: "COLUMN_SCALAR",
                         type: "Text")

    # ---- data insert to table ----
    values = [
      { "_key" => "/path/to/document/1",
        "body" => "Meros was furious." },
      { "_key" => "/path/to/document/2",
        "body" => "Meros doesn't understand politics." },
      { "_key" => "/path/to/document/3",
        "body" => "Meros had a friend of stilts." },      
    ]   
    client.load(table: "docs",
                values: values.to_json)
  end

  # ---- data search ----
  query = "loose the temper"
  response = client.select(table: "docs",
                           query: query,
                           match_columns: "body")
  puts "hits: #{response.n_hits} (query: #{query} -> body)"
  response.records.each do |record|
    p record
  end

  query = "Politics"
  response = client.select(table: "docs",
                           query: "body:@#{query}")
  puts "hits: #{response.n_hits} (query: #{query} -> body)"
  response.records.each do |record|
    p record
  end

  filter = "/path/to/document/3"
  response = client.select(table: "docs",
                           filter: "_key == '#{filter}'")
  puts "hits: #{response.n_hits} (filter: #{filter} -> _key)"
  response.records.each do |record|
    p record
  end
  query = "/document"
  response = client.select(table: "docs",
                           query: "_key:@#{query}")
  puts "hits: #{response.n_hits} (query: #{query} -> _key)"
  response.records.each do |record|
    p record
  end

end

Then execute.

$ ruby ./test.rb
hits: 1 (query:loose the temper-> body)
{"_id"=>1, "_key"=>"/path/to/document/1", "body"=>"Meros was furious."}
hits: 1 (query:Politics-> body)
{"_id"=>2, "_key"=>"/path/to/document/2", "body"=>"Meros doesn't understand politics."}
hits: 1 (filter: /path/to/document/3 -> _key)
{"_id"=>3, "_key"=>"/path/to/document/3", "body"=>"Meros had a friend of stilts."}
hits: 3 (query: /document -> _key)
{"_id"=>1, "_key"=>"/path/to/document/1", "body"=>"Meros was furious."}
{"_id"=>2, "_key"=>"/path/to/document/2", "body"=>"Meros doesn't understand politics."}
{"_id"=>3, "_key"=>"/path/to/document/3", "body"=>"Meros had a friend of stilts."}

It looks good.

I tried to write a query in two ways by searching for'rage'and'politics'. Using match_columns and writing like column: @ query. For the time being, which one is fine?

.. .. .. I mean, when I created the table, I didn't set a tokenizer or something, and I didn't create an index table, but I can search the body column as "% rage%". .. .. Is that what a query is? .. ..

I mean, it looks like that. Yes.

I tried to create an index table

So, I created an index table. It seems that the search time will be very fast if you make this.

test2.rb


# -*- coding: utf-8 -*-
require "groonga/client"

host = "127.0.0.1"
port = 10041
Groonga::Client.open(host: host, port: port, protocol: :http) do |client|
  tables = client.table_list
  unless tables.map{|m| m.name}.include?("doc_indexes")
    # ---- create indexes ----
    client.table_create(name: "doc_indexes",
                        flags: "TABLE_PAT_KEY",
                        key_type: "ShortText",
                        default_tokenizer: "TokenBigram",
                        normalizer: "NormalizerAuto")
    client.column_create(table: "doc_indexes",
                         name: "body_index",
                         flags: "COLUMN_INDEX|WITH_POSITION",
                         type: "docs",
                         source: "body")
  end

  query = "I don't know"
  response = client.select(table: "docs",
                           query: query,
                           match_columns: "doc_indexes.body_index")
  puts "hits: #{response.n_hits} (query: #{query} -> doc_indexes.body_index)"
  response.records.each do |record|
    p record
  end  
end

When executed with

$ ruby ./test2.rb
hits: 1 (query:I don't know-> doc_indexes.body_index)
{"_id"=>2, "_key"=>"/path/to/document/2", "body"=>"Meros doesn't understand politics."}

I got it. Perhaps this is the right way to go. .. ..

Umm.

About Groonga

Oh yeah, I forgot the important thing. groonga is a database system optimized for full-text search. So it's a little different from the RDB and KVS I use all the time, so it's hard to get used to it.

In RDB, you can create a column to be used as a key when creating a table, a column to store data, etc., but groonga has a key structure and full-text search function when creating a table for the first time. Is specified, and columns are added to the table later. Also, how to create an index and how to search is a unique impression. I think it will be convenient if you can master it.

that's all.

Recommended Posts

It's new, but I tried using Groonga
I tried using Gson
I tried using TestNG
I tried using Galasa
I tried using azure cloud-init
I tried using Apache Wicket
I tried using Java REPL
I tried to make a new sorting algorithm, but I don't know if it's really new
I tried using anakia + Jing now
I tried using Spring + Mybatis + DbUnit
I tried using JOOQ with Gradle
I tried using Java8 Stream API
I tried using JWT in Java
[Android] I tried using Coordinator Layout.
I tried using Pari gp container
I tried using WebAssembly Stadio (2018/4/17 version)
I tried using Java memo LocalDate
I tried using GoogleHttpClient of Java
I tried using Elasticsearch API in Java
I tried using Realm with Swift UI
I tried using Java's diagnostic tool Arthas
I tried using UICollectionViewListCell added from Xcode12.
I tried using Scalar DL with Docker
I tried the new era in Java
I tried using OpenCV with Java + Tomcat
I tried using Junit on Mac VScode Maven
[For beginners] I tried using DBUnit in Eclipse
I tried barcode scanning using Rails + React + QuaggaJS
[For beginners] I tried using JUnit 5 in Eclipse
[Android] I quit SQLite and tried using Realm
I made blackjack with Ruby (I tried using minitest)
[API] I tried using the zip code search API
I tried tomcat
I tried youtubeDataApi.
I tried refactoring ①
I tried FizzBuzz.
I tried to implement a server using Netty
I tried using the profiler of IntelliJ IDEA
I tried JHipster 5.1
I tried using a database connection in Android development
I tried using Alibaba Cloud's KMS (Key Management Service) service
I tried using Google Cloud Vision API in Java
I tried to operate SQS using AWS Java SDK
I tried the new feature profiler of IntelliJ IDEA 2019.2.
I tried using the Migration Toolkit for Application Binaries
I tried using Docker Desktop for Windows on Windows 10 Home
I tried using an extended for statement in Java
I tried scraping a stock chart using Java (Jsoup)
I tried to build an environment using Docker (beginner)
[I tried] Spring tutorial
I tried running Autoware
I tried Spring Batch
I tried node-jt400 (Programs)
I tried node-jt400 (execute)
I tried node-jt400 (Transactions)
I tried unit testing Rails app using RSpec and FactoryBot
I tried using the GitHub repository as a library server
[Rails] I tried using the button_to method for the first time
I tried to introduce UI animation to Pokedex using Poké API
I tried using Hotwire to make Rails 6.1 scaffold a SPA
I tried to build the environment little by little using docker