TL;DR
--Linear search --Binary search --Hash search
Consider an example of searching for a customer by specifying mail_address.
--Refer to the target data in order to check if the search key mail_address matches.
If there are 1024 data, it will be referenced 1024 times. Stupid.
O(n)
--Sort the whole by mail_address --Refer to the data in the middle to see if they match. --If they don't match, discard half and refer to the middle of the other half --A kind of divide-and-conquer law
1024 cases = 2 ^ 10 cases = 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 cases are discarded in half, so You can find it by referring to at most 10 data. smart.
O(\log n)
2 ^ (log n) = n
--Pass the mail_address of all the target data to the hash function and convert it according to a certain rule to get the hash value. --The search key mail_address is also passed to the same hash function and converted according to the same rule to get the hash value. --Refer to the data with matching hash value --No additional processing is required to find the specified hash value xxx from the hash value block of the target data (Implementation example: Hash table % 8F% E3% 83% 83% E3% 82% B7% E3% 83% A5% E3% 83% 86% E3% 83% BC% E3% 83% 96% E3% 83% AB))
Even with 1024 cases, you can directly refer to the target data by going through the hash function once.
O(1)
In general, the algorithm is a trade-off between reduced execution time and reduced memory consumption (https://en.wikipedia.org/wiki/%E6%99%82%E9%96%93%E3%81%A8% E7% A9% BA% E9% 96% 93% E3% 81% AE% E3% 83% 88% E3% 83% AC% E3% 83% BC% E3% 83% 89% E3% 82% AA% E3% 83% 95) There are implementations that are both bad, and you can make one, There will be no algorithm that gives the best performance at the same time for each The hash method probably does not leak and consumes memory.
PostgreSQL execution plan (explain, explain analyze)
--Seq Scan: Linear search --Index Scan: Something like a binary search --Hash ~: Hash method
[PostgreSQL Index Only Scan Struggle Part 3 | TECHSCORE BLOG](https://www.techscore.com/blog/2013/06/07/postgresql-index-only-scan-%E5%A5%AE%E9%97 % 98% E8% A8% 98-% E3% 81% 9D% E3% 81% AE3 /) [SQL] Read the execution plan from zero knowledge and improve performance --Qiita EXPLAIN --PostgreSQL 10.5 document
Even a simple processing branch takes 0.61 seconds if executed 2 billion times.
Prepare 100,000 customers and search your email address 1000 times. https://ideone.com/w0hDzg
It is 0.9 seconds when the place (TODO) to put the search process is not implemented.
import java.util.*;
import java.lang.*;
import java.io.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
//Create data for n people
int n = 100000;
Random rand = new Random();
String[][] customer = new String[n][2];
for (int i = 0; i < n; i++)
customer[i][0] = String.valueOf((char) (rand.nextInt(26)+'a')) + i + "@ab.com";
for (int i = 0; i < n; i++)
customer[i][1] = i + "Mr.";
//Sort by email address
Arrays.sort(customer, (c1, c2) -> c1[0].compareTo(c2[0]));
//We will also create a hash map with the key as the email address and the value as the name.
Map<String, String> hashmap = new HashMap<>(n);
for (String[] c : customer)
hashmap.put(c[0], c[1]);
//Same customer(101st)Can take the name of
System.out.println(customer[101][0] + "Is the name" + customer[101][1]);
//You can also get it by email address from the hash map
System.out.println(customer[101][0] + "Is the name" + hashmap.get(customer[101][0]));
System.out.println("Confirmation finished");
//Then, select the email addresses for m people appropriately
int m = 1000;
String[] keys = new String[m];
for (int j = 0; j < m; j++)
keys[j] = customer[rand.nextInt(n)][0];
//One by one
for (int j = 0; j < m; j++){
String key = keys[j];
//Let's search
// TODO
String found = null;
if (j % 100 == 0) {
System.out.println(key + "Is the name" + Objects.toString(found, "unknown"));
}
}
}
}
It's 3.9 seconds. slow! https://ideone.com/GIle7V
//Where it was TODO
String found = null;
for (int i = 0; i < n; i++) {
if (Objects.equals(key, customer[i][0])) {
found = customer[i][1];
}
}
It's 1 second. In the first place, the data preparation took 0.9 seconds, so the search itself took almost no time. https://ideone.com/eF7Txv
//Where it was TODO
String found = null;
keyCustomer[0] = key;
int index = Arrays.binarySearch(customer, keyCustomer, (c1, c2) -> c1[0].compareTo(c2[0]));
found = customer[index][1];
0.93 seconds. Almost no time https://ideone.com/67PzQP
//Where it was TODO
String found = null;
found = hashmap.get(key);
Since the difference between the binary search and the hash method was the error level, Let's compare by multiplying the number of searches (m) by 1024. The binary search is 1.69 seconds. https://ideone.com/ppG31d
Let's also increase the number of searches (m) by the hash method by 1024 times. It's 1.16 seconds. https://ideone.com/ppG31d Compared to binary search, the increase in execution time is slower as the number of search keys increases. It is the difference in the amount of calculation
Although there are few opportunities for beginners to be aware of it, they are using it. Consider the situation where you search the order history list in the database. In SQL, combine the customer customer table (more than 10,000 items) with the order order table (more than 10,000 items) and the order_detail order details table (more than 10,000 items) to get the details, and then the product product table (more than 10,000 items). Combine super) to get the product name. If you execute this SQL and finish in a realistic time, the DB administrator sets the appropriate primary key constraint / unique constraint / index, etc., and the DB performs a binary search based on it (like something). The tree structure data used for) and the hash table used for the hash method are prepared in advance and are used when executing SQL. In the field, processing is slow because the DB definition is missing / only linear search is possible because the SQL table join condition is complicated, and it is slow / you do not notice those problems until you run it in the production environment / there is room for improvement There are many unfortunate situations such as being left unattended. Please be aware that there are other suspicious points outside of your responsibility.
Hope this helps.
[What is Linear Search-IT Glossary e-Words] (http://e-words.jp/w/%E7%B7%9A%E5%BD%A2%E6%8E%A2%E7%B4%A2.html) [What is binary search (2-minute search)? --IT Glossary e-Words](http://e-words.jp/w/%E4%BA%8C%E5%88%86%E6%8E%A2% E7% B4% A2.html) [What is hash method (hash search)? --IT Glossary e-Words](http://e-words.jp/w/%E3%83%8F%E3%83%83%E3%82%B7%E3 % 83% A5% E6% B3% 95.html) Linear Search-Wikipedia Binary Search-Wikipedia
Recommended Posts