Try to implement a binary dict-like internal emulation of Python's standard map type

This article

--A summary of the elements needed to comply with Python's standard map types (currently dict only) --As an example, implemented with a binary search tree

Introduction

In Python, there is only dict as a standard map type. This is implemented in hash as a data structure. A data structure that can express a map from a key to an object is known to use a search tree other than hash, but it does not exist in the Python standard library (in c ++, a binary tree std :: map and a hash implementation There is std :: unordered_map). I implemented a simple binary search tree while summarizing the necessary elements as a standard map type.

Elements required as a mapping type

--Access with d [x]: ʻobj .__ getitem __ (key) --Add element withd [x] = y: ʻobj.__setitem__ (key, item) --for k in d: Iterate the key in a loop: ʻobj.iter () --Check if the key exists withx in d: ʻobj.__contains__ (key) --Delete key x with del d [x]: ʻobj.delitem (key) --ʻObj.keys (), ʻobj.item (), ʻobj.values return key, element, key and tuple view object of element respectively (in this implementation, iterator is returned directly) --View object must support len (view_object), ʻiter (view_object) , x in view_object --Return the number of elements withlen (d): ʻobj .__ len __ () --Whether the key is immutable: Not implemented here --Implementation of methods such as clear, get, pop, popitem, setdefault, update

Implementation

python


import queue
    
class BinaryTree(object):
    class Node(object):
        def __init__(self, _key=None, _item=None):
            self.key = _key
            self.item = _item
            self.left = None
            self.right = None
            
        def __repr__(self):
            return str(self.key)+': '+str(self.item)
        
        def __str__(self):
            return self.__repr__()
            
    def __init__(self, *args, **kargs):
        self.size = 0
        self.head = None
        if len(args):
            for key, item in args:
                self.insert(key, item)
        elif len(kargs):
            for key in kargs:
                self.insert(key, kargs[key])

    def search(self, key):
        return self._search(key, self.head, self.head)[0]

    def _search(self, key, node, parent):
        if not node:
            return None, parent
        
        if key == node.key :
            return node, parent
        elif key < node.key:
            return self._search(key, node.left, node)
        else:
            return self._search(key, node.right, node)

    def insert(self, key, item):
        if not self.head:
            self.head = self.Node(key, item)
            self.size += 1
            return
        n, p = self._search(key, self.head, self.head)
        if n:
            n.item = item
        else:
            self.size += 1
            if key < p.key:
                p.left = self.Node(key, item)
            else:
                p.right = self.Node(key, item)                    

    def delete(self, key):
        n, p = self._search(key, self.head, self.head)
        if n == p:
            self.head = None
            self.size -= 1
        else:
            self._delete(key, n, p)

    def _delete(self, key, node, parent):
        if node:
            if not node.left and not node.right:
                if parent.left == node:
                    parent.left = None
                else:
                    parent.right = None
                del node
                self.size -= 1                
            elif node.left and node.right:
                if parent.left == node:
                    right_min, right_min_parent = self.min_key_node(node.right, node)
                    node.key = right_min.key
                    node.item = right_min.item
                    self._delete(node.key, right_min, right_min_parent)
                else:
                    left_max, left_max_parent = self.max_key_node(node.left, node)
                    node.key = left_max.key
                    node.item = left_max.item
                    self._delete(node.key, left_max, left_max_parent)

            elif node.left:
                if parent.left == node:
                    parent.left = node.left
                else:
                    parent.right = node.left
                del node
                self.size -= 1                
            else:
                if parent.left == node:
                    parent.left = node.right
                else:
                    parent.right = node.right
                del node
                self.size -= 1
        
    def min_key_node(self, node, parent):
        if not node.left:
            return node, parent
        else:
            return self.min_key_node(node.left, node)
        
    def max_key_node(self, node, parent):
        if not node.right:
            return node, parent
        else:
            return self.max_key_node(node.right, node)

    def in_order_traverse(self, use_keys=True, use_items=True):
        if self.head:
            yield from self._in_order_traverse(self.head, use_keys, use_items)

    def _in_order_traverse(self, node, use_keys, use_items):
        if node.left:
            yield from self._in_order_traverse(node.left, use_keys, use_items)
        if use_keys and use_items:
            yield node.key, node.item
        elif use_keys:
            yield node.key
        else:
            yield node.item            
        if node.right:
            yield from self._in_order_traverse(node.right, use_keys, use_items)

    def clear(self):
        while self.head:
            self._delete(self.head.key)

    def copy(self):
        return BinaryTree(self.items())

    def get(self, key, default=None):
        n = self.search(key)
        if n:
            return n.item
        else:
            return default

    def pop(self, key):
        n = self.search(key)
        if n:
            i = n.item
            self.delete(key)
            return i
        else:
            self.__missing__(key)

        
    def __getitem__(self, key):
        n = self.search(key)
        if n:
            return n.item
        else:
            self.__missing__(key)

    def __missing__(self, key):
        raise KeyError(key)

    def __setitem__(self, key, item):
        return self.insert(key, item)

    def __delitem__(self, key):
        return self.delete(key)

    def __iter__(self):
        return self.in_order_traverse(use_keys=True, use_items=False)                        

    def __reversed__(self):
        return BinaryTreeIter(self.head, reverse=True)        
    
    def __contains__(self, key):
        if self.search(self.head):
            return True
        else:
            return False

    def __len__(self):
        return self.size

    def items(self):
        return self.in_order_traverse()        
    
    def keys(self):
        return self.in_order_traverse(use_keys=True, use_items=False)                
    
    def values(self):
        return self.in_order_traverse(use_keys=False, use_items=True)                        

    def __repr__(self):
        out = []
        for key, item in self.items():
            out.append(str(key)+': '+str(item))
        return '{'+', '.join(out)+'}'

    def __str__(self):
        return self.__repr__()

Example of use

>>> import random
>>> bt = BinaryTree()
>>> random_data = list(range(20))
>>> random.shuffle(random_data)
>>> for k in random_data:
        bt[k] += k**2
>>> bt[10] = -1
>>> for k in bt:
        print('key:', k, 'value:', bt[k], end=', ')
key: 0 value: 0, key: 1 value: 1, key: 2 value: 4, key: 3 value: 9, key: 4 value: 16, key: 5 value: 25, key: 6 value: 36, key: 7 value: 49, key: 8 value: 64, key: 9 value: 81, key: 10 value: -1, key: 11 value: 121, key: 12 value: 144, key: 13 value: 169, key: 14 value: 196, key: 15 value: 225, key: 16 value: 256, key: 17 value: 289, key: 18 value: 324, key: 19 value: 361, 
>>> bt[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/Macintosh_HD/Users/user/std_tree.py", line 178, in __getitem__
    self.__missing__(key)
  File "/Volumes/Macintosh_HD/Users/user/std_tree.py", line 181, in __missing__
    raise KeyError(key)
KeyError: 100

Since it is a binary tree, the keys can be returned in the sorted order.

Summary

Current problems, etc.

-- d.pop (k, [d]) Returns KeyErrorwhen there is nod and k does not exist, and returns d when there is d and k does not exist. How should I implement this behavior? (Not a keyword argument)-> You can do it by receiving the argument with * args and processing it yourself. --In the case of dict (hash table), it can be processed by whether the key is the same in the hash value. However, if you make a tree as it is now, it is necessary to define the magnitude relationship between keys. This is difficult to use because there are strong restrictions on dicts that can be used as keys for anything that is possible. --There is a method to use hash value as key as a means to solve the above and to make it possible to store multiple items in each node in order to deal with hash value collision. It is unknown whether there is any difference when using the hash value as the key --If you need to build a search tree on your own, it is an individual case such as manipulating a special tree, returning a node at the time of searching, holding a node in the middle, processing for a subtree, etc. It is possible that the standard interface cannot handle it, and it may be inappropriate to support it.

at the end

I feel like I tried it for the time being. I'm sorry I haven't checked most of the messy methods. Also, I don't know what to do about implementation, so I would appreciate it if you could point out.

Recommended Posts

Try to implement a binary dict-like internal emulation of Python's standard map type
Try to make a kernel of Jupyter
Try drawing a map with Python's folium package
How to write a list / dictionary type of Python3
Try to implement yolact