Attempt to gradually improve solid programs (trade-off between description amount and execution speed)

Introduction

I had to implement a certain process. The process is as follows. "For each object in the list, remove it from the list if it is inferior to the new object." What is inferior? However, assuming that the object is represented by a tuple, if the new object is n and the object originally in the list is x,

[n[i] > x[i] for i in range(len(n))]

Suppose that the result of is [True, True, ..., True] (all True).

However, it depends on the problem whether each subscript should have a larger value or a smaller value (that is, the program must deal with it). For example, if the 1st axis is maximized and the 2nd axis is minimized, the conditions for the objects to be removed are as follows.

n[0] > x[0] and n[1] < x[1]

In the following, I will write this process in Python for the time being, and then I would like to improve it step by step.

First program

That's why it is a program that implements the processing explained. Whether it is maximized or minimized on each axis is passed in the argument maximize (although the variable name is not good enough).

filters.py


def filter00(l, n, maximize):
    nl = []
    for x in l:
        remove = True
        for i in range(len(x)):
            if maximize[i]:
                if n[i] > x[i]:
                    pass
                else:
                    remove = False
            else:
                if n[i] < x[i]:
                    pass
                else:
                    remove = False
        if not remove:
            nl.append(x)
    return nl

Check on the premise of deleting, and if there is an axis for which x is better, cancel the deletion. I think it's redundant that if is pass and else and remove is False, but I think that flipping the condition or adding not will reduce readability, so I will leave it as it is (I will improve it anyway)

Improvement # 1: Remove conditional branches that are not related to loops

The following code for the filter00 function,

for i in range(len(x)):
    if maximize[i]:
        if n[i] > x[i]:
            pass
        else:
            remove = False
    else:
        if n[i] < x[i]:
            pass
        else:
            remove = False

maximize is passed as an argument and does not change in the loop. I feel that it is an extra process to make a conditional branch every time. So, let's think about whether we can get ʻif maximize [i]` out.

The method is to make n [i]> x [i] and n [i] <x [i] into a lambda expression. Python has an operator module that executes the function corresponding to the operator, so let's rewrite it using it.

filters.py


import operator

def filter01(l, n, maximize):
    cond = []
    for m in maximize:
        cond.append(operator.gt if m else operator.lt)
    nl = []
    for x in l:
        remove = True
        for i in range(len(x)):
            if cond[i](n[i],x[i]):
                pass
            else:
                remove = False
        if not remove:
            nl.append(x)
    return nl

Performance measurement

By the way, I deleted the conditional branch that does not change depending on the loop, but is it really faster? Let's measure properly, not the senses. This time, I made the following measurement script.

perf.py


import filters

import time
import random
import numpy as np

random.seed(123)

def perf(ll, n, m, f):
    elapsed = []
    for l in ll:
        start = time.time()
        f(l, n, m)
        end = time.time()
        elapsed.append(end - start)
    print(f.__name__, np.average(elapsed))
LOOP = 100
SIZE = 1000

n = (70, 70)
m = (True, True)

ll = [[(random.randint(0, 99), random.randint(0, 99)) for _ in range(SIZE)] for _ in range(LOOP)]

perf(ll, n, m, filters.filter00)
perf(ll, n, m, filters.filter01)

There is a danger that Masakari will fly if you compare only with the average value, but please do not throw Masakari because the result of the preliminary survey is that the average value is okay (laugh)

Well, execution result

filter00 0.00383123636246 filter01 0.00437139987946

Unexpectedly, it was 5 milliseconds slower. It is possible that the cost of changing the comparison from an operation to a function call outweighed the cost of removing the conditional branch that does not change with the loop.

Improvement # 2: Use the zip function

I wanted to talk about being happy because the code was shorter and the speed was faster, but I stumbled at the beginning. Actually, this improvement was found by speed examination after shortening the code, but since it is possible to keep the code slow, it is better to use the zip function when dealing with multiple lists in a loop. I asked, but I tried what it was like.

filters.py


def filter02(l, n, maximize):
    cond = []
    for m in maximize:
        cond.append(operator.gt if m else operator.lt)
    nl = []
    for x in l:
        remove = True
        for c, a, b in zip(cond, n, x):
            if c(a, b):
                pass
            else:
                remove = False
        if not remove:
            nl.append(x)
    return nl

Measurement result. The values of filter00 and filter01 are exactly the same as before, but the values measured when the final result of the improvement program was created are given in small quantities. Not bad.

filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483

fast. It's faster than the initial code. The difference between filter01 and filter02 is whether you refer to cond, n, x with a subscript or receive what the zip function returns. Subscript access is called the __getitem__ method, so it seems that the cost of calling the function is gone. While saying that, even the zip function has to call __getitem__, but I'm interested in it because it's processed at the C level, but I'm going to improve the code for the time being.

Improvement # 3: Use comprehension

Inclusive notation that everyone loves. It's too clumsy to write a sloppy for loop. Write neatly using the inclusion notation.

filters.py


def filter03(l, n, maximize):
    cond = [operator.gt if m else operator.lt for m in maximize]
    return [x for x in l if not all([c(a, b) for c, a, b in zip(cond, n, x)])]

Suddenly too much! I'm going to get a tsukkomi, so I will explain it properly. Oh, but before that. First of all, this code is very short in terms of lines, but it's definitely slow. So if you want to know the fast code, skip to improvement # 5.

Inner inclusion notation

Now, let's talk about filter03. This function uses double inclusion notation. The inside explanation first.

[c(a, b) for c, a, b in zip(cond, n, x)]

This part is almost the same as the inner loop written in filter02. Mostly, in filter02, conditional branching was done and the value was set in a single variable remove, but in the above comprehension notation,

[True, True]

The difference is that the list is returned like this. This is where the annoying problem arises. To decide whether to leave or not, you have to make it a scalar, not a list. Use the all function for that. The all function is a function that returns True if the list (or iterable) is all True. By using this, you can realize the processing that conditional branching with filter02 and setting remove to False in the case of else. (If you don't lose on any axis, at least one of the lists will be False and the whole will be False)

Outer comprehension

That's why I knew the inside, so this time it's the outside.

[x for x in l if not all([c(a, b) for c, a, b in zip(cond, n, x)])]

How this works is

  1. One element is taken from l and assigned to x
  2. The x is used to evaluate the ʻif ...` part
  3. If True is returned, include x in the list

So, the outer comprehension corresponds to the code that was handling the filter02 to not append depending on the value of remove (determined by the inner loop).

Execution speed

Well, the long-awaited execution speed.

filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542

slow. It's been much slower than ever. Actually, the comprehension is executed functionally & it seems that overhead is added due to the appearance of the all function.

Improvement # 4: Use internal functions

Well, there is no speed advantage already, but there is an advantage in the amount of code description, so I will improve the code of improvement 3 a little more. What was difficult to understand with filter03 is the double inclusion notation, so I will use an internal function to make that part a little cooler.

filters.py


def filter04(l, n, maximize):
    cond = [operator.gt if m else operator.lt for m in maximize]
    def dominated(x, n):
        return all([c(a, b) for c, a, b in zip(cond, n, x)])
    return [x for x in l if not dominated(x, n)]

By changing the comprehension (and all function) written after ʻif not to an internal function called dominated`, it became possible to explain what you are doing in one word, and readability has improved.

The execution speed is the slowest as you can imagine.

filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542 filter04 0.00578190803528

Improvement # 5: Use your knowledge to write fast-moving code

Now, let's look back on this.

So, the code is based on these. In addition, filter021 means that it has been modified based on filter02.

filters.py


def filter021(l, n, maximize):
    nl = []
    for x in l:
        remove = True
        for a, b, m in zip(n, x, maximize):
            remove = remove and (a > b if m else a < b)
        if not remove:
            nl.append(x)
    return nl

Since the inner loop and operator cannot be used, I made it compact by using the and operator and the conditional operator.

Well, execution time

filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542 filter04 0.00578190803528 filter021 0.00255712985992

Fastest: v:

Improvement # 6: Try to specialize

The story is not over yet. Can't we somehow make it faster? The idea is to change the code depending on whether you use it often or not (general). In other words, even if you say "N variables, maximum and minimum may be mixed", you can write as follows if you say "2 variables, maximize both axes" in most uses.

filters.py


def filter02X(l, n, maximize):
    if len(n) == 2 and maximize == (True, True):
        return [x for x in l if not (n[0] > x[0] and n[1] > x[1])]
    else:
        return filter021(l, n, maximize)

Measurement result

filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542 filter04 0.00578190803528 filter021 0.00255712985992 filter02X 0.000690214633942

It's an extraordinary speed: v :: v: In the case of 2 variables and 3 variables in the language processing system implementation, it means that it is more than that.

Summary

In this article, we've gradually improved the solid program. The initial idea was to make the code that is not Python-like (the code that people who have learned other languages tend to write when writing programs in Python) become Python-like, which makes it more readable and faster and makes them happy. I wanted to, but when I actually measured it, I found that it wasn't so, so I changed the story a little. In summary, there are the following three points.

2020/3/22 postscript

It's been 3 years since the article was posted. At that time, the power of numpy was still low, and I thought, "The speed improvement I wrote in that article, now I can use numpy to make it even faster", but I haven't touched it. On the other hand, when I maintained the program that was the source of this article for the first time in a while, I was disappointed with the slowness and made it faster using numpy.

I want to erase all for

There is something in filter021 that numpy ○ chi cannot forgive. Yes, it's a for statement. That's why I erased it.

filters.py


def filter05(l, n, maximize):
    cond = [np.greater if m else np.less for m in maximize]
    npa = np.array(l)
    check = np.empty(npa.shape, dtype=np.bool)
    for i, c in enumerate(cond):
        check[:, i] = c(n[i], npa[:, i])
    #Elements that lose all axes to n (conditional expressions are all True) are remove, otherwise not remove
    not_remove = (check.sum(axis=1) != len(n))
    return [x for x, nr in zip(l, not_remove) if nr]

I can't erase it! There is a reason for the above code to be an excuse.

――Since it is different for each axis whether it is maximized or minimized (with my numpy power), it cannot be written with one conditional expression. --The return value must be list. Also, in the original material program, the element of list is annoying as "an object of a class that inherits list", and not_remove cannot be applied directly to npa (using boolean index) [^ 1]

[^ 1]: It doesn't meet the specifications, but I tried using boolean index, but it didn't speed up.

Let's measure the time anyway.

filter00 0.0037975716590881348 filter01 0.004087417125701904 filter02 0.003038032054901123 filter03 0.004926862716674804 filter04 0.005476489067077637 filter021 0.002838168144226074 filter02X 0.0006895852088928222 filter05 0.002588319778442383

It's an error. The reason it doesn't get faster seems to be the overhead of converting list to numpy. If len (l) >> len (n), the for statement that compares each axis can be ignored.

Everything to numpy

As mentioned above, the original program cannot assume a numpy array as input, so it cannot be made any faster, but I tried to see how fast it would be if I / O could be a numpy array.

filters.py


def filter06(a, n, maximize):
    cond = [np.greater if m else np.less for m in maximize]
    check = np.empty(a.shape, dtype=np.bool)
    for i, c in enumerate(cond):
        check[:, i] = c(n[i], a[:, i])
    not_remove = (check.sum(axis=1) != len(n))
    return a[not_remove]

Make it a numpy array before applying it to the measurement.

perf.py


a = np.array(ll)
perf(a, n, m, filters.filter06)

result. After all it is fast if it can be completed only with numpy.

filter00 0.0037975716590881348 filter01 0.004087417125701904 filter02 0.003038032054901123 filter03 0.004926862716674804 filter04 0.005476489067077637 filter021 0.002838168144226074 filter02X 0.0006895852088928222 filter05 0.002588319778442383 filter06 0.00030982017517089844

Or rather, the specialized version (filter02X) is why it's so fast. By the way, I made a specialized version of filter05 and filter06 and measured it, but the result was almost the same (rather slower).

2020/3/28 postscript

I wondered if there was a way to do it all at once by comparing each axis of filter06, but I couldn't eliminate the comparison by axis, but I came up with a way to delete the sum after for.

filers.py


def filter061(a, n, maximize):
    cond = [np.greater if m else np.less for m in maximize]
    check = np.ones(a.shape[0], dtype=np.bool)
    for i, c in enumerate(cond):
        check = check & c(n[i], a[:, i])
    not_remove = ~check
    return a[not_remove]

It will be about twice as fast when measured. I also applied the same improvement to filter05, but the conversion overhead was still larger.

filter06 0.0003197813034057617 filter061 0.00017987966537475587

If you invert the comparison operation itself and eliminate the last ~, it will be 10 to 20% faster, but readability will decrease.

filters.py


def filter062(a, n, maximize):
    cond = [np.less if m else np.greater for m in maximize]
    not_remove = np.zeros(a.shape[0], dtype=np.bool)
    for i, c in enumerate(cond):
        not_remove = not_remove | c(n[i], a[:, i])
    return a[not_remove]

Recommended Posts

Attempt to gradually improve solid programs (trade-off between description amount and execution speed)
Now "average of sums from 1 to 10" and its execution speed