Do you know the data structure called Trie tree? Tri-tree is a data structure that can be used to implement high-speed dictionaries. Implementing using a data structure called LOUDS has the advantage that the tree can be represented with very little memory. Tri-tree is used to keep the Japanese dictionary in memory, for example, in kana-kanji conversion (Google Japanese input, etc.).
The trie tree has this shape. (The figure is taken from Wikipedia)
What makes it different from other trees is that the "branches" of the tree are labeled.
How can I use this as a dictionary? Let's search for "in". it's simple. All you have to do is follow i-> n. You can retrieve "5" as a search result. Let's search for "inn". All you have to do is move down from the current "5" position. "9" can be retrieved as a result.
What's great about this trie tree is that you can search very fast. In a tri-tree, the search speed is proportional to the size of the key to search. And it's not proportional to the size of the __ tree! __
For example, suppose you implement a Japanese dictionary with a tri-tree. Searching for the 12-letter "Waraukado ni Fukukitaru" in the Japanese dictionary takes about 4 (= 12/3) times longer than searching for the 3-letter "Warau". However, if you look up the same word, whether the dictionary contains 100 words or 1 million words, the search speed will theoretically be the same. It's amazing.
Also, when searching for a character string that has a common part, you can reduce unnecessary searches by restarting the search from the middle. For example, when I searched for "inn" earlier, I was able to quickly find the search results for "inn" by restarting the search from the node "5" in the result of searching for "in".
Implementing a tri-tree without thinking about it using structures and pointers consumes a lot of memory. The machine freezes if the memory allocation is too busy. Therefore, we use a data representation method called LOUDS. With LOUDS, one node can be represented by only 2 bits. A tree with 11 nodes like the one shown at the beginning can be represented by only 23 bits, and it fits in one long type variable.
I would like to explain LOUDS here, but since Explanation of succinct data structure LOUDS was very easy to understand, I will give it to you. ..
The code is below. Implemented in D language and Python respectively. github.com/IshitaTakeshi/Louds-Trie I've written a lot of comments, so I think you can understand it by reading the LOUDS explanation at the link above and then reading the code. If you have any questions or concerns, please comment or tell us on twitter.
It can be handled by the D language package management system DUB. DUB page Source code
Recommended Posts