I want to create a priority queue that can be updated in Python (2.7)

Overview

Play by creating a priority queue that can update the element. To be precise, pushing the same element with different priorities creates a priority queue that overrides the priorities.

Synopsis

Due to the algorithm quick reference, I needed to use the priority queue, but I searched for "priority queue Python" and it appears at the top. http://docs.python.jp/2/library/heapq.html When I looked at it, it said something terrible.

Priority queues are a common use of heaps and have some difficulties in implementation:

--Sort stability: How can two tasks of equal priority be returned in the order they were originally added? --In future Python 3, tuple comparisons will be split into (priority, task) pairs if the priorities are equal and the task does not have a default comparison order. --If a task's priority changes, how do you move it to a new location on the heap? --When an unresolved task needs to be deleted, how do you find it in the queue and delete it?

The solution to the first two difficulties is to save the item as a three-element list that contains the priority, item number, and task. This item number acts as a tie-breaker to ensure that two tasks of the same priority are returned in the order in which they were added. And since the two item numbers are never equal, the tuple comparison cannot try to compare the two tasks directly. The remaining difficulty is primarily to find open tasks, change their priority, or remove them altogether. Finding a task is done by a dictionary that points to items in the queue. Deleting items or changing their priorities is more difficult as it breaks the immutable relationships of the heap structure. So a possible solution is to mark the item as invalid and add the changed priority item if necessary:

Well, I know what you're saying. I wondered if Python doesn't provide priority queues by default because heapq says something like this. However, that is not the case, and there is a proper Queue.PriorityQueue (). However, although there is, it seems that it is not possible to immediately perform the update process using this PriorityQueue (). Also, considering what is written on this page, I was curious about how much the calculation speed would be better or worse ~~ I don't care ~~.

design

A hand-made priority queue (hereinafter referred to as "hand queue") consists of a list of tuples (sortkey, (value, generation)) and a dictionary that holds one generation for each value. Deletion is an image that is not actually performed as described above, but is flagged, but it is controlled by saving the latest generation number in the dictionary and not using the latest data. I will. In addition, since the search performance deteriorates when elements are accumulated, data deletion + heap reconstruction shall be performed in a garbage collection manner. Regarding the timing of "garbage collection" here, I decided to keep it as an appropriate constant m, and reconstruct it when the number of elements (the number of tuples with the outermost parentheses counted as 1) exceeds m. And so on.

In this "hand cue" design,

--push → heapq.heappush () to update the dictionary, and if the size exceeds m, heap again with only the latest generation

I will do it like that. Random access to the elements of the queue is something we don't care about for now. If you give the dictionary a value, it will use more memory, but in theory you can access it randomly.

Implementation

TeQueue.py


import heapq
def tepush(l, d, m, sortkey, value):
    if not value in d:
        generation = 1
    else:
        generation = d[value] + 1
    d[value] = generation
    heapq.heappush(l, (sortkey, value, generation))
    length = len(l)
    if length > m: #Heap reconstruction
        lt = []
        for n in range(0,length):
            if l[n][2] == d[l[n][1]]:
                heapq.heappush(lt, (l[n][0], l[n][1], 1))
                d[l[n][1]] = 1
        return lt
    else:
        return l

def tepop(l, d):
    while len(l) > 0:
        (sortkey, value, generation) = heapq.heappop(l)
        if d[value] == generation:
            return (sortkey, value, generation) 

This makes it possible to "overwrite" the value corresponding to the same value as described in the design.

Comparison with other queues

Compare with Queue.PriorityQueue ().

Initialization

For the time being, perform the following initialization.

initialize.py


#Generate a random number sequence of N
import random
N = 100000
M = 100000
target = range(0,N)
temp = range(0,N)
for n in range(N-1,-1,-1):
  target[n] = temp.pop(random.randint(0,n))

push

pushtest.py


import Queue
import heapq
import datetime

d = datetime.datetime(1,1,1)
time1 = d.now()
t = Queue.PriorityQueue(N)
for n in range(0,N):
    t.put((target[n], n))
time2 = d.now()
u = []
for n in range(0,N):
    heapq.heappush(u, (target[n], n))
time3 = d.now()
v = []
dd = {}
for n in range(0,N):
    v = tepush(v, dd, 2*M, target[n], n)
time4 = d.now()
print('push:Queue',(time2 - time1).__str__())
print('push:Heap',(time3 - time2).__str__())
print('push:TeQ',(time4 - time3).__str__())

push result


('push:Queue', '0:00:00.350000')
('push:Heap', '0:00:00.086000')
('push:TeQ', '0:00:00.137000')

It looks like this. TeQ> Heap because there are many comparisons, but it is much faster than Queue.

pop

poptest.py


import Queue
import heapq
import datetime

time1 = d.now()
for n in range(0,M):
    t.get()
time2 = d.now()
for n in range(0,M):
    heapq.heappop(u)
time3 = d.now()
for n in range(0,M):
    tepop(v, dd)
time4 = d.now()
print('pop:Queue',(time2 - time1).__str__())
print('pop:Heap',(time3 - time2).__str__())
print('pop:TeQ',(time4 - time3).__str__())

pop result


('pop:Queue', '0:00:00.484000')
('pop:Heap', '0:00:00.214000')
('pop:TeQ', '0:00:00.299000')

It looks like this. It's less different than push, but it's still the same trend. Actually, I wanted to play with M by changing the degree of duplication, but I realized that it doesn't make much sense for Queue and heapq, so I decided to make a comparison in the hand cue. When comparing with other than hand queue, http://stackoverflow.com/questions/9969236/how-to-implement-priority-queues-in-python It seemed good to compare it with something, but it was subordinated.

Comparison of M coefficient and speed when the multiplicity is changed

Experiment A

Initialization

initializeA.py


#Generate a random number sequence of N
import random
N = 1000000
M = 40000
target = range(0,N)
for n in range(N-1,-1,-1):
  target[n] = random.randint(0,M-1)

Implementation results

testA.py


d = datetime.datetime(1,1,1)
for m in [2,5,10,20]:
    time1 = d.now()
    v = []
    dd = {}
    for n in range(0,N):
        v = tepush(v, dd, m*M, n, target[n])
    time2 = d.now()
    for n in range(0,M):
        tepop(v, dd)
    time3 = d.now()
    print('push:TeQ_' + str(m),(time2 - time1).__str__())
    print('pop:TeQ_' + str(m),(time3 - time2).__str__())

Result A


('push:TeQ_2', '0:00:02.731000')
('pop:TeQ_2', '0:00:00.193000')
('push:TeQ_5', '0:00:01.794000')
('pop:TeQ_5', '0:00:00.527000')
('push:TeQ_10', '0:00:01.667000')
('pop:TeQ_10', '0:00:00.711000')
('push:TeQ_20', '0:00:01.605000')
('pop:TeQ_20', '0:00:00.581000')

The reason why 20 pops faster is that 20 * 40000 and 10 * 40000 overlap, so it ends up in the same state. .. .. HM.

Experiment B

Initialization

initializeB.py


#Generate a random number sequence of N
import random
N = 1000000
M = 4000
target = range(0,N)
for n in range(N-1,-1,-1):
  target[n] = random.randint(0,M-1)

Implementation results

testB.py


d = datetime.datetime(1,1,1)
for m in [2,5,10,20]:
    time1 = d.now()
    v = []
    dd = {}
    for n in range(0,N):
        v = tepush(v, dd, m*M, n, target[n])
    time2 = d.now()
    for n in range(0,M):
        tepop(v, dd)
    time3 = d.now()
    print('push:TeQ_' + str(m),(time2 - time1).__str__())
    print('pop:TeQ_' + str(m),(time3 - time2).__str__())

Result B


('push:TeQ_2', '0:00:02.927000')
('pop:TeQ_2', '0:00:00.013000')
('push:TeQ_5', '0:00:02.035000')
('pop:TeQ_5', '0:00:00.018000')
('push:TeQ_10', '0:00:01.929000')
('pop:TeQ_10', '0:00:00.093000')
('push:TeQ_20', '0:00:01.846000')
('pop:TeQ_20', '0:00:00.026000')

Even after trying several times, this tendency does not change much. Why. Even if the order is changed, 20 remains overwhelmingly faster than 10. HM. .. ..

Summary

I feel like I don't have to do the equivalent of garbage collection or reindexing.

Recommended Posts

I want to create a priority queue that can be updated in Python (2.7)
I want to create a window in Python
I tried to create a class that can easily serialize Json in Python
I want to embed a variable in a Python string
I want to easily implement a timeout in python
I want to write in Python! (2) Let's write a test
I want to randomly sample a file in Python
I want to work with a robot in python.
I made a familiar function that can be used in statistics with Python
I want to be able to run Python in VS Code
I want to make input () a nice complement in python
A mechanism to call a Ruby method from Python that can be done in 200 lines
[Python3] Code that can be used when you want to cut out an image in a specific size
I wanted to create a program in Reverse Polish Notation in Python (determining if a string can be converted to a number)
I want to use a wildcard that I want to shell with Python remove
I want to print in a comprehension
How to set up a simple SMTP server that can be tested locally in Python
Qiskit: I want to create a circuit that creates arbitrary states! !!
I want to build a Python environment
I made a shuffle that can be reset (reverted) with Python
[Python3] Code that can be used when you want to resize images in folder units
I want to create a pipfile and reflect it in docker
I made a web application in Python that converts Markdown to HTML
I want to convert a table converted to PDF in Python back to CSV
I tried to develop a Formatter that outputs Python logs in JSON
I want to color a part of an Excel string in Python
I want to do a monkey patch only partially safely in Python
I want to do Dunnett's test in Python
I want to easily create a Noise Model
A memo that I wrote a quicksort in Python
How to create a JSON file in Python
I want to make a game with Python
I want to merge nested dicts in Python
I want to create a plug-in type implementation
I want to write to a file with Python
I want to display the progress in Python!
I wrote a tri-tree that can be used for high-speed dictionary implementation in D language and Python.
[Python] When the priority is the same in Priority Queue, it can be acquired in the order in which it was added to the queue.
A solution to the problem that the Python version in Conda cannot be changed
Create a plugin that allows you to search Sublime Text 3 tabs in Python
I registered PyQCheck, a library that can perform QuickCheck with Python, in PyPI.
How to install a Python library that can be used by pharmaceutical companies
I want to exe and distribute a program that resizes images Python3 + pyinstaller
I want to create a web application that uses League of Legends data ①
Python program is slow! I want to speed up! In such a case ...
I tried to implement what seems to be a Windows snipping tool in Python
I want to write in Python! (1) Code format check
I want to generate a UUID quickly (memorandum) ~ Python ~
I want to transition with a button in flask
Even in JavaScript, I want to see Python `range ()`!
Create a plugin to run Python Doctest in Vim (2)
I tried to implement a pseudo pachislot in Python
Create a plugin to run Python Doctest in Vim (1)
In Python, create a decorator that dynamically accepts arguments Create a decorator
[Python] I want to make a nested list a tuple
I want to write in Python! (3) Utilize the mock
I want to manually create a legend with matplotlib
I want to use the R dataset in python
I want to run a quantum computer with Python
I want to do something in Python when I finish
I want to manipulate strings in Kotlin like Python!