You will be an engineer in 100 days ――Day 60 ――Programming ――About data structure and sorting algorithm

Click here until yesterday

You will become an engineer in 100 days-Day 59-Programming-Algorithms

You will become an engineer in 100 days --- Day 53 --Git --About Git

You will become an engineer in 100 days --Day 42 --Cloud --About cloud services

You will become an engineer in 100 days --Day 36 --Database --About the database

You will be an engineer in 100 days --Day 24 --Python --Basics of Python language 1

You will become an engineer in 100 days --Day 18 --Javascript --JavaScript basics 1

You will become an engineer in 100 days --Day 14 --CSS --CSS Basics 1

You will become an engineer in 100 days --Day 6 --HTML --HTML basics 1

data structure

Data structures are important for learning algorithms. Since the algorithm is a procedure, I said what kind of data and how to manipulate it. The structure of the data is also an important item in the algorithm.

list

Python list type is an array type data type using an index You can access them in order.

List type can store various data as elements You can add, change, replace, and delete elements.

#List definition
l = [1,2,3,4,5]
print(l)

#Element reference
print(l[0])

#Change elements
l[1] = 8
print(l)

#Add element
l.append(6)
print(l)

#Delete element
l.pop(3)
print(l)

#Insert element
l.insert(4,9)
print(l)

#Swap elements
l[0], l[3] = l[3], l[0]
print(l)

[1, 2, 3, 4, 5] 1 [1, 8, 3, 4, 5] [1, 8, 3, 4, 5, 6] [1, 8, 3, 5, 6] [1, 8, 3, 5, 9, 6] [5, 8, 3, 1, 9, 6]

Stacks and queues

Stack and queue are also one of the ways to handle data.

stack -Data can be added from the beginning of the column, and data can be retrieved from the beginning of the column. ・ Last-in-first-out LIFO (Last-In-First-Out) -Putting data on the stack is called push, and taking it out is called pop.

queue -Add data from the end of the column, retrieve data from the beginning of the column ・ First-in first-out (Tokoroten method) FIFO (First-In-First-Out) -Putting data in a queue is called enqueue, and retrieving it is called dequeue.

Reference: wikipedia

In Python, it can be realized by list type or ç.

stack

stack = []

#push
stack.append(1)
stack.append(2)
stack.append(3)
print(stack)

#pop
stack.pop()
print(stack)
stack.pop()
print(stack)

[1, 2, 3] [1, 2] [1]

queue

from collections import deque

#Queue definition
queue = deque(["A","B","C"])
print(queue)

#Enqueue
queue.append("D")
print(queue)

#Decue
queue.popleft()
print(queue)

deque(['A', 'B', 'C']) deque(['A', 'B', 'C', 'D']) deque(['A', 'B', 'C'])

Heap

What is a heap? A child element is always larger or equal (or always smaller or equal) than a parent element. It is a tree-structured data with the constraint.

When we simply say heap, we often refer to the binary heap using a binary tree.

Reference: wikipedia

The heap can be implemented in Python with the heapq library. You can quickly retrieve the minimum (or maximum) value in the heap.

import heapq

#Create a list
heap = [1, 6, 8, 0, -1]
print(heap)

#Heap an existing list(minimum value)
heapq.heapify(heap)
print(heap)

#Add element
heapq.heappush(heap, 10)
print(heap)

#Remove minimum value
print(heapq.heappop(heap))
print(heap)

max_heap = [3,7,5,9,-2]
print(max_heap)
#Create heap with maximum value
heapq._heapify_max(max_heap) 
print(max_heap)

#Remove maximum value
print(heapq._heappop_max(max_heap))
print(max_heap)

[1, 6, 8, 0, -1] [-1, 0, 8, 1, 6] [-1, 0, 8, 1, 6, 10] -1 [0, 1, 8, 10, 6] [3, 7, 5, 9, -2] [9, 7, 5, 3, -2] 9 [7, 3, 5, -2]

About sorting algorithm

The sort algorithm is an algorithm that sorts. There are many variations to simply sort.

Stable stable sorting is when there are multiple records with the same key in the data After sorting, the order of the records before sorting is maintained.

Sorting so that the positional relationship before sorting is broken by sorting It is called an unstable ʻunstable` sort.

It is desirable to adopt a sorting algorithm that finishes sorting quickly and is stable.

In the sorting algorithm, the amount of calculation is also a guide. The amount of calculation of the sort algorithm that is generally said is as follows.

algorithm Stability Average time Worst time Best time Worst space
Bubble sort Stable \mathcal{O}(n^2) \mathcal{O}(n^2) \mathcal{O}(n) \mathcal{O}(1)
Selection sort unstable \mathcal{O}(n^2) \mathcal{O}(n^2) \mathcal{O}(n) \mathcal{O}(n)
Insertion sort Stable \mathcal{O}(n^2) \mathcal{O}(n^2) \mathcal{O}(n) \mathcal{O}(n)
Heapsort unstable \mathcal{O}(n\log n) \mathcal{O}(n\log n) \mathcal{O}(n\log n) \mathcal{O}(1)
Merge sort Stable \mathcal{O}(n\log n) \mathcal{O}(n\log n) \mathcal{O}(n\log n) \mathcal{O}(n)
Quick sort unstable \mathcal{O}(n\log n) \mathcal{O}(n^2) \mathcal{O}(n) \mathcal{O}(n)
python sort(Tim sort) Stable \mathcal{O}(n\log n) \mathcal{O}(n\log n) \mathcal{O}(n) \mathcal{O}(n)

Now let's look at various sorting algorithms. I have posted an implementation example, but I think there is a better way to write it.

Bubble sort

Bubble sort compares the values of two adjacent elements in a list It is an alignment algorithm that performs exchange according to conditions.

Sort the list in descending order (descending order) or ascending order (ascending order).

#Bubble sort
def bubble_sort(data):
    for i in range(len(data)):
        for j in range(len(data) - i - 1):
            if data[j] > data[j + 1]:
                data[j], data[j + 1] = data[j + 1], data[j]
    return data

Select sort

Selection sort looks for the element with the minimum (maximum) value of the array It is an algorithm that performs alignment by exchanging it with the first element of the array.

#Selection sort
def select_sort(data):
    for i in range(len(data)):
        min = i
        for j in range(i + 1, len(data)):
            if data[min] > data[j]:
                min = j
        data[i], data[min] = data[min], data[i]
    return data

Insert sort

Insert sort is by inserting a new element in the appropriate position for the aligned part of the list. An algorithm for alignment.

#Insertion sort
def insert_sort(data):
    for i in range(1, len(data)):
        temp = data[i]
        j = i - 1
        while (j >= 0) and (data[j] > temp):
            data[j + 1] = data[j]
            j -= 1
        data[j + 1] = temp
    return data

Heap sort

Heapsort is a sorting algorithm that sorts the list using the binary heap tree.

Take elements from the unsorted list and add them to the heap in order. Repeat the following until you add all the elements (take the route and add it to the sorted list)

#Heapsort
def heap_sort(data):
    for i in range(len(data)):
        j = i
        while (j > 0) and (data[(j - 1) // 2] < data[j]):
            data[(j - 1) // 2], data[j] = data[j], data[(j - 1) // 2]
            j = (j - 1) // 2

    for i in range(len(data), 0, -1):
        data[i - 1], data[0] = data[0], data[i - 1]
        j = 0
        while ((2 * j + 1 < i - 1) and (data[j] < data[2 * j + 1]))\
            or ((2 * j + 2 < i - 1) and (data[j] < data[2 * j + 2])):
            if (2 * j + 2 == i - 1) or (data[2 * j + 1] > data[2 * j + 2]):
                data[j], data[2 * j + 1] = data[2 * j + 1], data[j]
                j = 2 * j + 1
            else:
                data[j], data[2 * j + 2] = data[2 * j + 2], data[j]
                j = 2 * j + 2
    return data

In python, using the library heapq that realizes heap queue You can also easily write a heapsort.

import heapq

#Heapsort using heapq
def heap_sort2(data):
    h = data.copy()
    heapq.heapify(h)
    return [heapq.heappop(h) for _ in range(len(h))]

Merge sort

Merge sort recursively divides the array you want to sort and merges again` It is a sorting algorithm that tries to realize sorting.

#Merge sort
def merge_sort(data):
    if len(data) <= 1:
        return data
    m = len(data) // 2
    l = merge_sort(data[:m])
    r = merge_sort(data[m:])
    return merge(l , r)

def merge(left, right):
    res , i , j = [], 0, 0
    while (i < len(left)) and (j < len(right)):
        if left[i] <= right[j]:
            res.append(left[i])
            i += 1
        else:
            res.append(right[j])
            j += 1
    if i < len(left):
        res.extend(left[i:])
    if j < len(right):
        res.extend(right[j:])
    return res

Quick sort

The feature of quicksort is that the number of data comparisons and exchanges is very small. You can efficiently sort data that is scattered randomly, It is an algorithm that is not stable.

def quick_sort(data):
    def quick(list, left, right):
        pl = left
        pr = right
        y = list[(pl + pr) // 2]
        while True:
            while list[pl] < y: pl += 1
            while list[pr] > y: pr -= 1
            if pl <= pr:
                list[pl], list[pr] = list[pr], list[pl]
                pl += 1
                pr -= 1
            if pl > pr:
                break
        if left < pr: quick(list, left, pr)
        if pl < right: quick(list, pl, right)
        return data
    quick(data, 0, len(data) - 1)
    return data

Python standard sort

Python's standard sort is (probably) Timsort.

Timsort is a stable hybrid derived from merge sort and insertion sort The sorting algorithm is designed to work well with different types of data.

def python_sort(data):
    data.sort()
    return data

Compare the execution time of the sort algorithm

Let's measure the execution time of the above sorting algorithm. You can easily measure with the following code.

Sorts 10000 numbers. With jupyter notebook, you can measure the time by using% time.

import numpy as np
import sys
sys.setrecursionlimit(20000)

data = np.arange(10000)
np.random.shuffle(data)
data = list(data)

print('bubble_sort')
%time l = bubble_sort(data)
print('select_sort')
%time l = select_sort(data)
print('insert_sort')
%time l = insert_sort(data)
print('heap_sort')
%time l = heap_sort(data)
print('heap_sort2')
%time l = heap_sort2(data)
print('merge_sort')
%time l = merge_sort(data)
print('quick_sort')
%time l = quick_sort(data)
print('python_sort')
%time l = python_sort(data)

bubble_sort CPU times: user 18 µs, sys: 0 ns, total: 18 µs Wall time: 20 µs select_sort CPU times: user 18 µs, sys: 1 µs, total: 19 µs Wall time: 21 µs insert_sort CPU times: user 7 µs, sys: 0 ns, total: 7 µs Wall time: 10 µs heap_sort CPU times: user 38 µs, sys: 1 µs, total: 39 µs Wall time: 40.1 µs heap_sort2 CPU times: user 11 µs, sys: 0 ns, total: 11 µs Wall time: 12.9 µs merge_sort CPU times: user 31 µs, sys: 1e+03 ns, total: 32 µs Wall time: 33.1 µs quick_sort CPU times: user 13 µs, sys: 1 µs, total: 14 µs Wall time: 14.8 µs python_sort CPU times: user 4 µs, sys: 0 ns, total: 4 µs Wall time: 6.2 µs

Well, the shortest time is Python's standard sort.

I think that this will change considerably depending on how you implement it. I think it's quite difficult to achieve speed that exceeds standard sorting. Normally, you don't need to write your own program for sorting.

Summary

Python sorting is so powerful that you may not usually care about it. There are few problems with sorting.

How is sorting done? If you compare how each method is different You will understand the depth of the algorithm.

40 days until you become an engineer

Author information

Otsu py's HP: http://www.otupy.net/

Youtube: https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw

Twitter: https://twitter.com/otupython

Recommended Posts

You will be an engineer in 100 days ――Day 60 ――Programming ――About data structure and sorting algorithm
You will be an engineer in 100 days ――Day 71 ――Programming ――About scraping 2
You will be an engineer in 100 days ――Day 61 ――Programming ――About exploration
You will be an engineer in 100 days ――Day 74 ――Programming ――About scraping 5
You will be an engineer in 100 days ――Day 73 ――Programming ――About scraping 4
You will be an engineer in 100 days ――Day 75 ――Programming ――About scraping 6
You will be an engineer in 100 days --Day 68 --Programming --About TF-IDF
You will be an engineer in 100 days ――Day 70 ――Programming ――About scraping
You will be an engineer in 100 days ――Day 81 ――Programming ――About machine learning 6
You will be an engineer in 100 days ――Day 82 ――Programming ――About machine learning 7
You will be an engineer in 100 days ――Day 79 ――Programming ――About machine learning 4
You will be an engineer in 100 days ――Day 76 ――Programming ――About machine learning
You will be an engineer in 100 days ――Day 80 ――Programming ――About machine learning 5
You will be an engineer in 100 days ――Day 78 ――Programming ――About machine learning 3
You will be an engineer in 100 days ――Day 84 ――Programming ――About machine learning 9
You will be an engineer in 100 days ――Day 83 ――Programming ――About machine learning 8
You will be an engineer in 100 days ――Day 77 ――Programming ――About machine learning 2
You will be an engineer in 100 days ――Day 85 ――Programming ――About machine learning 10
You will be an engineer in 100 days --Day 63 --Programming --Probability 1
You will be an engineer in 100 days --Day 65 --Programming --Probability 3
You will be an engineer in 100 days --Day 64 --Programming --Probability 2
You will be an engineer in 100 days --Day 86 --Database --About Hadoop
You will be an engineer in 100 days --Day 27 --Python --Python Exercise 1
You will be an engineer in 100 days --Day 34 --Python --Python Exercise 3
You will be an engineer in 100 days --Day 31 --Python --Python Exercise 2
You become an engineer in 100 days ――Day 67 ――Programming ――About morphological analysis
You become an engineer in 100 days ――Day 66 ――Programming ――About natural language processing
You will be an engineer in 100 days ――Day 24 ―― Python ―― Basics of Python language 1
You will be an engineer in 100 days ――Day 30 ―― Python ―― Basics of Python language 6
You will be an engineer in 100 days ――Day 25 ―― Python ―― Basics of Python language 2
You will be an engineer in 100 days --Day 29 --Python --Basics of the Python language 5
You will be an engineer in 100 days --Day 26 --Python --Basics of the Python language 3
You will be an engineer in 100 days --Day 32 --Python --Basics of the Python language 7
You will be an engineer in 100 days --Day 28 --Python --Basics of the Python language 4
Sorting algorithm and implementation in Python
You have to be careful about the commands you use every day in the production environment.