This time, I will explain the hash method. It's just for my own understanding, so I'm simply satisfied with the part that I'm satisfied with this level of understanding, and on the other hand, I explain the part that took a long time to understand. Please note that.
The hash method is an algorithm that can efficiently add and delete elements as well as search. When considering an array, adding an element requires inserting after performing a search and shifting all the elements behind it, so the processing cost is unexpectedly high. So let's use this hash method.
An array that stores keys so that the hash value is a subscript is a ** hash table **. For example, consider a hash table with the remainder divided by 13 as the hash value.
Key | 5 | 6 | 14 | 20 | 29 | 34 | 37 | 51 | 69 | 75 |
---|---|---|---|---|---|---|---|---|---|---|
value | Mr. Yamamoto | Nozaki | Mr. Takahashi | Suzuki | Mr. Hashimoto | Mr. Hasegawa | Ogawa | Mr. Sato | April 1st | Brian |
Hash value(Residual divided by 13) | 5 | 6 | 1 | 7 | 3 | 8 | 11 | 12 | 4 | 10 |
In this case, put Yamamoto in a [5], Nozaki in a [6], Takahashi in a [1], and so on. Of course, it is not necessary to fill all the elements of the prepared array. For example, a [0] is None here. Also, each element of the array a that appears here is called a ** bucket **.
The open addressing method is a method of rehashing until there is no collision when hash value collision occurs. It will be rehashed if necessary when exploring, adding, or deleting elements.
Key | 5 | 6 | 14 | 20 | 29 | 34 | 37 | 51 | 69 | 75 |
---|---|---|---|---|---|---|---|---|---|---|
value | Mr. Yamamoto | Nozaki | Mr. Takahashi | Suzuki | Mr. Hashimoto | Mr. Hasegawa | Ogawa | Mr. Sato | April 1st | Brian |
Hash value(Residual divided by 13) | 5 | 6 | 1 | 7 | 3 | 8 | 11 | 12 | 4 | 10 |
Array | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | a[9] | a[10] | a[11] | a[12] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
value | None | Mr. Takahashi | None | Mr. Hashimoto | April 1st | Mr. Yamamoto | Nozaki | Suzuki | Mr. Hasegawa | None | Brian | Ogawa | Mr. Sato |
If you add (key: 18, Value: Johnny)
to this array, the hash value is 18% 13 = 5
, so it's already filled with Mr. Yamamoto. So rehash it as (18 + 1)% 13 = 6
. This is also filled with Nozaki-kun, so again, 7 is also filled with Suzuki-kun, so again ... There is no element with a key with a hash value of 9, so I finally succeeded in adding Johnny to a [9]. Did.
Array | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | a[9] | a[10] | a[11] | a[12] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
value | None | Mr. Takahashi | None | Mr. Hashimoto | April 1st | Mr. Yamamoto | Nozaki | Suzuki | Mr. Hasegawa | Johnny | Brian | Ogawa | Mr. Sato |
Now, how do you delete an element? For example, suppose you delete Yamamoto from this array. If I simply deleted it, searching for Johnny would find it here in ʻa [9]as a result of the rehash, but the original hash value was
5. Therefore, the value with hash value 5 will not be found and the search will fail. If ʻa [5]
is empty from the beginning, it's okay to fail the search, but after it's been deleted, maybe the rehash fills the value you want to search later. right. Therefore, in order to make a judgment about this area, each bucket of the array can have three states of ʻEMPTY, OCCUPIED, and DELETED. This way, when searching for the value of key 18, ʻa [5]
is in the DELETED
state, so you can rehash and continue the search.
class Status(Enum):
OCCUPIED = 0
EMPTY = 1
DELETED = 2
Defines the state of the bucket. In C language
#define OCCUPIED 0
#define EMPTY 1
#define DELETED 2
It's the same as
class Bucket:
#Buckets that make up the hash
def __init__(self, key: Any = None, value: Any = None, stat: Status = Status.EMPTY) -> None:
self.key = key
self.value = value
self.stat = stat
def set(self, key: Any, value: Any, stat: Status) -> None:
self.key = key
self.value = value
self.stat = stat
def set_status(self, stat: Status) -> None:
self.stat = stat
This class is creating an instance of a bucket. A bucket instance is basically an individual element of the array we saw earlier. Each of these elements has three attributes: key, value, and state.
class OpenHash:
def __init__(self, capacity: int) -> None:
"""Initialization"""
self.capacity = capacity
self.table = [Bucket()] * self.capacity
def hash_value(self, key: Any) -> int:
"""Find the hash value from key"""
if isinstance(key, int):
return key % self.capacity
return (int(hashlib.md5(str(key).encode()).hexdigest(), 16) % self.capacity)
def rehash_value(self, key: Any) -> int:
"""Rehash and return the hash value"""
return (self.hash_value(key) + 1) % self.capacity
def search_node(self, key: Any) -> Any:
"""Search for buckets whose key is key"""
hash = self.hash_value(key)
p = self.table[hash] #The hashth bucket goes into p.
for i in range(self.capacity): #Loop for the number of buckets in the array
if p.stat == Status.EMPTY: #If the bucket is empty, it means that there was nothing in it from the beginning, so break and return None
break
elif p.stat == Status.OCCUPIED and p.key == key: #If you find a bucket with the key you want to find, return that bucket
return p
hash = self.rehash_value(hash)
p = self.table[hash]
return None
def search(self, key: Any) -> Any:
"""Search for the element with the key key and return the value"""
p = self.search_node(key)
if p is not None:
return p.value
else:
return None
def add(self, key: Any, value: Any) -> bool:
"""Add an element whose key is key and whose value is value"""
if self.search(key) is not None:
return False
hash = self.hash_value(key)
p = self.table[hash]
for i in range(self.capacity):
if p.stat == Status.EMPTY or p.stat == Status.DELETED:
self.table[hash] = Bucket(key, value, Status.OCCUPIED)
return True
hash = self.rehash_value(hash)
p = self.table[hash]
return False
def remove(self, key: Any) -> int:
"""Delete element with key key"""
p = self.search_node(key)
if p is None:
return False
p.set_status(Status.DELETED)
return True
def dump(self) -> None:
"""Dump hash table"""
for i in range(self.capacity):
print(f'{i:2} ', end = '')
if self.table[i].stat == Status.OCCUPIED:
print(f'{self.table[i].key} ({self.table[i].value})')
elif self.table[i].stat == Status.EMPTY:
print('---Unregistered---')
elif self.table[i].stat == Status.DELETED:
print('---deleted---')
Earlier I said that trying to find the value of key: 18 after a [5] was deleted would fail, but let's see how this is avoided.
def search_node(self, key: Any) -> Any:
"""Search for buckets whose key is key"""
hash = self.hash_value(key)
p = self.table[hash] #The hashth bucket goes into p.
for i in range(self.capacity): #Loop for the number of buckets in the array
if p.stat == Status.EMPTY: #If the bucket is empty, it means that there was nothing in it from the beginning, so break and return None
break
elif p.stat == Status.OCCUPIED and p.key == key: #If you find a bucket with the key you want to find, return that bucket
return p
hash = self.rehash_value(hash)
p = self.table[hash]
return None
When searching, search_node (key)
is called from search (key)
, which is a method to return the bucket with the key. If you give 18 to search ()
as the key, the search will start for ʻa [5]. Since this is a state of DELETE, we will rehash through ʻif
and ʻelif and target ʻa [(18 + 1)% 13]
. By repeating this, you can eventually find Johnny in ʻa [9]`.
Let's actually see an example of using the above class.
from enum import Enum
Menu = Enum('Menu', ['add to', 'Delete', 'search', 'dump', 'End'])
def select_menu() -> Menu:
"""Menu selection"""
s = [f'({m.value}){m.name}'for m in Menu]
while True:
print(*s, sep=' ', end='')
n = int(input(' : '))
if 1 <= n <= len(Menu):
return Menu(n)
hash = OpenHash(13)
while True:
menu = select_menu()
if menu == Menu.add to:
key = int(input('Key: '))
val = input('value: ')
if not hash.add(key, val):
print('Addition failure')
elif menu == Menu.Delete:
key = int(input('Key: '))
if not hash.remove(key):
print('Delete failure')
elif menu == Menu.search:
key = int(input('Key: '))
t = hash.search(key)
if t is not None:
print(f'The value with that key is{t}is.')
else:
print('There is no applicable data.')
elif menu == Menu.dump:
hash.dump()
else:
break
Input and output below
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 5
value:Mr. Yamamoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 6
value:Nozaki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 14
value:Mr. Takahashi
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 20
value:Suzuki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 29
value:Mr. Hashimoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 34
value:Mr. Hasegawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 37
value:Ogawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 51
value:Mr. Sato
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 69
value:April 1st
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 75
value:Brian
(1)add to(2)Delete(3)search(4)dump(5)End: 4
0 ---Unregistered---
1 14 (Mr. Takahashi)
2 ---Unregistered---
3 29 (Mr. Hashimoto)
4 69 (April 1st)
5 5 (Mr. Yamamoto)
6 6 (Nozaki)
7 20 (Suzuki)
8 34 (Mr. Hasegawa)
9 ---Unregistered---
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 5
value:Johnny
Addition failure!
(1)add to(2)Delete(3)search(4)dump(5)End: 4
0 ---Unregistered---
1 14 (Mr. Takahashi)
2 ---Unregistered---
3 29 (Mr. Hashimoto)
4 69 (April 1st)
5 5 (Mr. Yamamoto)
6 6 (Nozaki)
7 20 (Suzuki)
8 34 (Mr. Hasegawa)
9 ---Unregistered---
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 5
That ... When I tried to add Johnny to Mr. Yamamoto, the addition failed. In open addressing, even if ʻa [5]` is filled, it should be rehashed and somehow filled unless all the elements are full ...
The problem seems to be in ʻadd ()`
if self.search(key) is not None:
That's the part.
if self.search(key) == value:
I will rewrite it and try again.
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 5
value:Mr. Yamamoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 6
value:Nozaki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 14
value:Mr. Takahashi
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 20
value:Suzuki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 29
value:Mr. Hashimoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 34
value:Mr. Hasegawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 37
value:Ogawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 51
value:Mr. Sato
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 69
value:April 1st
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 75
value:Brian
(1)add to(2)Delete(3)search(4)dump(5)End: 4
0 ---Unregistered---
1 14 (Mr. Takahashi)
2 ---Unregistered---
3 29 (Mr. Hashimoto)
4 69 (April 1st)
5 5 (Mr. Yamamoto)
6 6 (Nozaki)
7 20 (Suzuki)
8 34 (Mr. Hasegawa)
9 ---Unregistered---
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 18
value:Johnny
(1)add to(2)Delete(3)search(4)dump(5)End: 4
0 ---Unregistered---
1 14 (Mr. Takahashi)
2 ---Unregistered---
3 29 (Mr. Hashimoto)
4 69 (April 1st)
5 5 (Mr. Yamamoto)
6 6 (Nozaki)
7 20 (Suzuki)
8 34 (Mr. Hasegawa)
9 18 (Johnny)
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 2
Key: 5
(1)add to(2)Delete(3)search(4)dump(5)End: 4
0 ---Unregistered---
1 14 (Mr. Takahashi)
2 ---Unregistered---
3 29 (Mr. Hashimoto)
4 69 (April 1st)
5 ---deleted---
6 6 (Nozaki)
7 20 (Suzuki)
8 34 (Mr. Hasegawa)
9 18 (Johnny)
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 3
Key: 18
The value with that key is Johnny.
(1)add to(2)Delete(3)search(4)dump(5)End: 5
Johnny was added safely, and he succeeded in searching for Johnny after Mr. Yamamoto was deleted.
[Reference books](https://www.amazon.co.jp/ Algorithms and data structures learned with new and clear Python-New and clear series-Nobuhiro Shibata / dp / 4815603197 / ref = pd_sbs_14_t_1/355-2283754-9134449 ? _encoding = UTF8 & pd_rd_i = 4815603197 & pd_rd_r = fdf0f7d2-76c5-4384-94b0-860a83839a41 & pd_rd_w = ELFk0 & pd_rd_wg = 8xj9G & pf_rd_p = ca22fd73-0f1e-4b39-9917-c84a20b3f3a8 & pf_rd_r = 3F77WDCW3H4HF22R94GE & psc = 1 & refRID = 3F77WDCW3H4HF22R94GE)
Recommended Posts