Hash method (open address method) in Python

This time, I will explain the hash method. It's just for my own understanding, so I'm simply satisfied with the part that I'm satisfied with this level of understanding, and on the other hand, I explain the part that took a long time to understand. Please note that.

What is the hash method?

The hash method is an algorithm that can efficiently add and delete elements as well as search. When considering an array, adding an element requires inserting after performing a search and shifting all the elements behind it, so the processing cost is unexpectedly high. So let's use this hash method.

An array that stores keys so that the hash value is a subscript is a ** hash table **. For example, consider a hash table with the remainder divided by 13 as the hash value.

Key 5 6 14 20 29 34 37 51 69 75
value Mr. Yamamoto Nozaki Mr. Takahashi Suzuki Mr. Hashimoto Mr. Hasegawa Ogawa Mr. Sato April 1st Brian
Hash value(Residual divided by 13) 5 6 1 7 3 8 11 12 4 10

In this case, put Yamamoto in a [5], Nozaki in a [6], Takahashi in a [1], and so on. Of course, it is not necessary to fill all the elements of the prepared array. For example, a [0] is None here. Also, each element of the array a that appears here is called a ** bucket **.

Open addressing method

The open addressing method is a method of rehashing until there is no collision when hash value collision occurs. It will be rehashed if necessary when exploring, adding, or deleting elements.

Add element

Key 5 6 14 20 29 34 37 51 69 75
value Mr. Yamamoto Nozaki Mr. Takahashi Suzuki Mr. Hashimoto Mr. Hasegawa Ogawa Mr. Sato April 1st Brian
Hash value(Residual divided by 13) 5 6 1 7 3 8 11 12 4 10
Array a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12]
value None Mr. Takahashi None Mr. Hashimoto April 1st Mr. Yamamoto Nozaki Suzuki Mr. Hasegawa None Brian Ogawa Mr. Sato

If you add (key: 18, Value: Johnny) to this array, the hash value is 18% 13 = 5, so it's already filled with Mr. Yamamoto. So rehash it as (18 + 1)% 13 = 6. This is also filled with Nozaki-kun, so again, 7 is also filled with Suzuki-kun, so again ... There is no element with a key with a hash value of 9, so I finally succeeded in adding Johnny to a [9]. Did.

Array a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12]
value None Mr. Takahashi None Mr. Hashimoto April 1st Mr. Yamamoto Nozaki Suzuki Mr. Hasegawa Johnny Brian Ogawa Mr. Sato

Delete element

Now, how do you delete an element? For example, suppose you delete Yamamoto from this array. If I simply deleted it, searching for Johnny would find it here in ʻa [9]as a result of the rehash, but the original hash value was5. Therefore, the value with hash value 5 will not be found and the search will fail. If ʻa [5] is empty from the beginning, it's okay to fail the search, but after it's been deleted, maybe the rehash fills the value you want to search later. right. Therefore, in order to make a judgment about this area, each bucket of the array can have three states of ʻEMPTY, OCCUPIED, and DELETED. This way, when searching for the value of key 18, ʻa [5] is in the DELETED state, so you can rehash and continue the search.

Class 1

class Status(Enum):
    OCCUPIED = 0
    EMPTY = 1
    DELETED = 2

Defines the state of the bucket. In C language

#define OCCUPIED 0
#define EMPTY 1
#define DELETED 2

It's the same as

Class 2

class Bucket:
    #Buckets that make up the hash

    def __init__(self, key: Any = None, value: Any = None, stat: Status = Status.EMPTY) -> None:
        self.key = key
        self.value = value
        self.stat = stat
    def set(self, key: Any, value: Any, stat: Status) -> None:
        self.key = key
        self.value = value
        self.stat = stat
    def set_status(self, stat: Status) -> None:
        self.stat = stat

This class is creating an instance of a bucket. A bucket instance is basically an individual element of the array we saw earlier. Each of these elements has three attributes: key, value, and state.

Class 3

class OpenHash:
    
    def __init__(self, capacity: int) -> None:
        """Initialization"""
        self.capacity = capacity
        self.table = [Bucket()] * self.capacity

    def hash_value(self, key: Any) -> int:
        """Find the hash value from key"""
        if isinstance(key, int):
            return key % self.capacity
        return (int(hashlib.md5(str(key).encode()).hexdigest(), 16) % self.capacity)

    def rehash_value(self, key: Any) -> int:
        """Rehash and return the hash value"""
        return (self.hash_value(key) + 1) % self.capacity

    def search_node(self, key: Any) -> Any:
        """Search for buckets whose key is key"""
        hash = self.hash_value(key)
        p = self.table[hash] #The hashth bucket goes into p.
        for i in range(self.capacity): #Loop for the number of buckets in the array
            if p.stat == Status.EMPTY: #If the bucket is empty, it means that there was nothing in it from the beginning, so break and return None
                break
            elif p.stat == Status.OCCUPIED and p.key == key: #If you find a bucket with the key you want to find, return that bucket
                return p
            hash = self.rehash_value(hash)
            p = self.table[hash]
        return None 

    def search(self, key: Any) -> Any:
        """Search for the element with the key key and return the value"""
        p = self.search_node(key)
        if p is not None:
            return p.value
        else:
            return None
    
    def add(self, key: Any, value: Any) -> bool:
        """Add an element whose key is key and whose value is value"""
        if self.search(key) is not None:
            return False
        hash = self.hash_value(key)
        p = self.table[hash]
        for i in range(self.capacity):
            if p.stat == Status.EMPTY or p.stat == Status.DELETED:
                self.table[hash] = Bucket(key, value, Status.OCCUPIED)
                return True
            hash = self.rehash_value(hash)
            p = self.table[hash]
        return False

    def remove(self, key: Any) -> int:
        """Delete element with key key"""
        p = self.search_node(key)
        if p is None:
            return False
        p.set_status(Status.DELETED)
        return True
    
    def dump(self) -> None:
        """Dump hash table"""
        for i in range(self.capacity):
            print(f'{i:2} ', end = '')
            if self.table[i].stat == Status.OCCUPIED:
                print(f'{self.table[i].key} ({self.table[i].value})')
            elif self.table[i].stat == Status.EMPTY:
                print('---Unregistered---')
            elif self.table[i].stat == Status.DELETED:
                print('---deleted---')
                

Earlier I said that trying to find the value of key: 18 after a [5] was deleted would fail, but let's see how this is avoided.

    def search_node(self, key: Any) -> Any:
        """Search for buckets whose key is key"""
        hash = self.hash_value(key)
        p = self.table[hash] #The hashth bucket goes into p.
        for i in range(self.capacity): #Loop for the number of buckets in the array
            if p.stat == Status.EMPTY: #If the bucket is empty, it means that there was nothing in it from the beginning, so break and return None
                break
            elif p.stat == Status.OCCUPIED and p.key == key: #If you find a bucket with the key you want to find, return that bucket
                return p
            hash = self.rehash_value(hash)
            p = self.table[hash]
        return None 

When searching, search_node (key) is called from search (key), which is a method to return the bucket with the key. If you give 18 to search () as the key, the search will start for ʻa [5]. Since this is a state of DELETE, we will rehash through ʻif and ʻelif and target ʻa [(18 + 1)% 13]. By repeating this, you can eventually find Johnny in ʻa [9]`.

Finally

Let's actually see an example of using the above class.


from enum import Enum
Menu = Enum('Menu', ['add to', 'Delete', 'search', 'dump', 'End'])

def select_menu() -> Menu:
    """Menu selection"""
    s = [f'({m.value}){m.name}'for m in Menu]
    while True:
        print(*s, sep=' ', end='')
        n = int(input(' : '))
        if 1 <= n <= len(Menu):
            return Menu(n)
        
hash = OpenHash(13)

while True:
    menu = select_menu()
    
    if menu == Menu.add to:
        key = int(input('Key: '))
        val = input('value: ')
        if not hash.add(key, val):
            print('Addition failure')
            
    elif menu == Menu.Delete:
        key = int(input('Key: '))
        if not hash.remove(key):
            print('Delete failure')
            
    elif menu == Menu.search:
        key = int(input('Key: '))
        t = hash.search(key)
        if t is not None:
            print(f'The value with that key is{t}is.')
        else:
            print('There is no applicable data.')
    elif menu == Menu.dump:
        hash.dump()
    else:
        break

Input and output below

(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 5
value:Mr. Yamamoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 6
value:Nozaki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 14
value:Mr. Takahashi
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 20
value:Suzuki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 29
value:Mr. Hashimoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 34
value:Mr. Hasegawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 37
value:Ogawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 51
value:Mr. Sato
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 69
value:April 1st
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 75
value:Brian
(1)add to(2)Delete(3)search(4)dump(5)End: 4
 0 ---Unregistered---
 1 14 (Mr. Takahashi)
 2 ---Unregistered---
 3 29 (Mr. Hashimoto)
 4 69 (April 1st)
 5 5 (Mr. Yamamoto)
 6 6 (Nozaki)
 7 20 (Suzuki)
 8 34 (Mr. Hasegawa)
 9 ---Unregistered---
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 5
value:Johnny
Addition failure!
(1)add to(2)Delete(3)search(4)dump(5)End: 4
 0 ---Unregistered---
 1 14 (Mr. Takahashi)
 2 ---Unregistered---
 3 29 (Mr. Hashimoto)
 4 69 (April 1st)
 5 5 (Mr. Yamamoto)
 6 6 (Nozaki)
 7 20 (Suzuki)
 8 34 (Mr. Hasegawa)
 9 ---Unregistered---
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 5

That ... When I tried to add Johnny to Mr. Yamamoto, the addition failed. In open addressing, even if ʻa [5]` is filled, it should be rehashed and somehow filled unless all the elements are full ...

The problem seems to be in ʻadd ()`

if self.search(key) is not None:

That's the part.

if self.search(key) == value:

I will rewrite it and try again.

(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 5
value:Mr. Yamamoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 6
value:Nozaki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 14
value:Mr. Takahashi
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 20
value:Suzuki
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 29
value:Mr. Hashimoto
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 34
value:Mr. Hasegawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 37
value:Ogawa
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 51
value:Mr. Sato
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 69
value:April 1st
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 75
value:Brian
(1)add to(2)Delete(3)search(4)dump(5)End: 4
 0 ---Unregistered---
 1 14 (Mr. Takahashi)
 2 ---Unregistered---
 3 29 (Mr. Hashimoto)
 4 69 (April 1st)
 5 5 (Mr. Yamamoto)
 6 6 (Nozaki)
 7 20 (Suzuki)
 8 34 (Mr. Hasegawa)
 9 ---Unregistered---
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 1
Key: 18
value:Johnny
(1)add to(2)Delete(3)search(4)dump(5)End: 4
 0 ---Unregistered---
 1 14 (Mr. Takahashi)
 2 ---Unregistered---
 3 29 (Mr. Hashimoto)
 4 69 (April 1st)
 5 5 (Mr. Yamamoto)
 6 6 (Nozaki)
 7 20 (Suzuki)
 8 34 (Mr. Hasegawa)
 9 18 (Johnny)
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 2
Key: 5
(1)add to(2)Delete(3)search(4)dump(5)End: 4
 0 ---Unregistered---
 1 14 (Mr. Takahashi)
 2 ---Unregistered---
 3 29 (Mr. Hashimoto)
 4 69 (April 1st)
 5 ---deleted---
 6 6 (Nozaki)
 7 20 (Suzuki)
 8 34 (Mr. Hasegawa)
 9 18 (Johnny)
10 75 (Brian)
11 37 (Ogawa)
12 51 (Mr. Sato)
(1)add to(2)Delete(3)search(4)dump(5)End: 3
Key: 18
The value with that key is Johnny.
(1)add to(2)Delete(3)search(4)dump(5)End: 5

Johnny was added safely, and he succeeded in searching for Johnny after Mr. Yamamoto was deleted.

[Reference books](https://www.amazon.co.jp/ Algorithms and data structures learned with new and clear Python-New and clear series-Nobuhiro Shibata / dp / 4815603197 / ref = pd_sbs_14_t_1/355-2283754-9134449 ? _encoding = UTF8 & pd_rd_i = 4815603197 & pd_rd_r = fdf0f7d2-76c5-4384-94b0-860a83839a41 & pd_rd_w = ELFk0 & pd_rd_wg = 8xj9G & pf_rd_p = ca22fd73-0f1e-4b39-9917-c84a20b3f3a8 & pf_rd_r = 3F77WDCW3H4HF22R94GE & psc = 1 & refRID = 3F77WDCW3H4HF22R94GE)

Recommended Posts

Hash method (open address method) in Python
Simplex method (simplex method) in Python
Private method in python
[Python] Insert ":" in MAC address
Implement method chain in Python
Suppressing method overrides in Python
Open UTF-8 with BOM in Python
Implemented label propagation method in Python
I got an AttributeError when mocking the open method in python
Get exchange rates from open exchange rates in Python
Method to build Python environment in Xcode 6
Electron Microscopy Simulation in Python: Multislice Method (1)
Electron Microscopy Simulation in Python: Multislice Method (2)
Hash in Perl is a dictionary in Python
Alignment algorithm by insertion method in Python
Get your own IP address in Python
Quadtree in Python --2
Python in optimization
CURL in python
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
[Python] Dictionary (hash)
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
[Python] Each hash
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Johnson method (python)
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
[Python] Semi-Lagrange method
flatten in python
Learn the design pattern "Template Method" in Python