About python comprehension

[nbviewer] Also posted on (http://nbviewer.jupyter.org/format/slides/github/y-sama/comprehension/blob/master/comprehension.ipynb#/).

Speaking of python, it is a comprehension notation. <= Prejudice? However, it is difficult to read unless you get used to it, so I tried to summarize how to read it in a little more detail.

Normal list generation

extension_1 = []
for i in range(10):
    extension_1.append(i)
extension_1
#>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

List comprehension

Basic syntax [counter for counter in iterator] I often write only [i for i in] first and then qualify. When generating a list equivalent to extension_1 in comprehension notation

comprehension_1= [i for i in range(10)]
comprehension_1
#>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

speed

Let's compare the execution speed of whether to use list comprehension notation. It was measured using jupyter's cell magic %% timeit.

%%timeit
extension_1 = []
for i in range(10000):
    extension_1.append(i)
#>>> 100 loops, best of 3: 3.37 ms per loop

%%timeit
comprehension_1= [i for i in range(10000)]
#>>> 1000 loops, best of 3: 1.05 ms per loop

** List comprehension is not only clean code but also speedy ** Reference: Why are Python comprehensions so fast? There are two main reasons for being slow.

--Refer to the list object append every time you loop --Run append as a python function

The former effect can also be eliminated by pushing the reference out of the loop.

%%timeit
extension_1_ = []
append=extension_1_.append
for i in range(10000):
    append(i)
#>>> 1000 loops, best of 3: 1.57 ms per loop

Generally, it is said that the execution speed is doubled when the list comprehension is used, but about 80% of it is due to the overhead of the append method call part.

When including if (postfix if)

There is no postfix if statement in python, but you can (as a result) write only list comprehensions.

extension_2 =[]
for i in range(10):
    if i%2==0:
        extension_2.append(i)
extension_2
#>>> [0, 2, 4, 6, 8]

Rewriting extension_2 in list comprehension notation looks like the following.

comprehension_2 = [i for i in range(10) if i%2==0]
comprehension_2
#>>> [0, 2, 4, 6, 8]

As a result, the syntax is postfix if, because inclusive notation, if clause and for clause are connected after for clause. (Imagine you can omit the colon and indentation.)

With indentation, the image is as follows.

[
i
for i in range(10)
  if i%2==0
]

I will explain in a little more detail in Darkness. ..

%%timeit
extension_2 =[]
for i in range(10000):
    if i%2==0:
        extension_2.append(i)
#>>> 100 loops, best of 3: 4.08 ms per loop

%%timeit
comprehension_2 = [i for i in range(10000) if i%2==0]
#>>> 100 loops, best of 3: 3.15 ms per loop

%%timeit
extension_2_ =[]
append=extension_2_.append
for i in range(10000):
    if i%2==0:
        append(i)
#>>> 100 loops, best of 3: 3.81 ms per loop

** Actually, if is calculation rate-determining, so even if you force the list comprehension notation, the speed will improve only about 20-30%. ** **

If it contains if ~ else (conditional operator)

It's confusing, but ** if it contains an else clause, the if position changes because the conditional operator (the ternary operator in other words) is used ** (The conditional operator is only supported by python 2.5 or later)

extension_3 =[]
for i in range(10):
    if i%2==0:
        extension_3.append(i)
    else:
        extension_3.append(str(i))
extension_3
#>>> [0, '1', 2, '3', 4, '5', 6, '7', 8, '9']

comprehension_3 = [ i if i%2==0 else str(i) for i in range(10)]
comprehension_3
# >>> [0, '1', 2, '3', 4, '5', 6, '7', 8, '9']

It may be easier to understand if you think that it is actually equivalent to this

extension_3_cond =[]
for i in range(10):
    extension_3_cond.append(i) if i%2==0 else extension_3_cond.append(str(i))
extension_3_cond
#>>> [0, '1', 2, '3', 4, '5', 6, '7', 8, '9']

If you are pulled by how to write a postfix if and add if ~ else after it, an error will occur.

#Does not work
[ i for i in range(10) if i%2==0 else str(i)]
#>>> SyntaxError: invalid syntax

Since if ~ else is rate-determining, there is not much speedup by inclusion notation.

Dictionary comprehension and set comprehension

In python2.7 or later, dictionary comprehension and set comprehension can be used as comprehensions other than lists.

comprehension_dict = {str(i):i for i in range(10)}
print(comprehension_dict)
#>>> {'7': 7, '8': 8, '2': 2, '9': 9, '0': 0, '1': 1, '6': 6, '5': 5, '4': 4, '3': 3}

It goes well with zip.

label = ["kinoko", "takenoko", "suginoko"]
feature = ["yama", "sato", "mura"]
{i:j for i,j in zip(label,feature)}
#>>> {'kinoko': 'yama', 'suginoko': 'mura', 'takenoko': 'sato'}

If you simply pass the key and value as in this case, you don't need to use comprehensions. dict(zip(label,feature))

Up to python2.6, I will pass tuple to dict.

comprehension_dict2 = dict((str(i),i) for i in range(10))
print(comprehension_dict2)

You can also use a postfix if.

comprehension_dict2 = {str(i):i for i in range(10) if i%2==0}
print(comprehension_dict2)
#>>> {'8': 8, '6': 6, '2': 2, '4': 4, '0': 0}

You can also use the conditional operator. Since it is a conditional operator, it must be described in each of the key and value before and after the ":".

comprehension_dict3 = {str(i) if i%2==0 else i : i if i%2==0 else str(i) for i in range(10)}
print(comprehension_dict3)
#>>> {'2': 2, 1: '1', 3: '3', 5: '5', '0': 0, 7: '7', 9: '9', '6': 6, '4': 4, '8': 8}

This doesn't work. (I did it before orz)

#Does not work
comprehension_dict4 = {str(i):i if i%2==0 else i:str(i) for i in range(10)}
#>>> SyntaxError: invalid syntax

Set comprehension

If you enclose it in {} without a colon, it will be a set comprehension.

comprehension_set={i%5 for i in range(10)}
comprehension_set
#>>> {0, 1, 2, 3, 4}

Note that {} with zero element means a dictionary.

zero_set={}
type(zero_set)
# >>> dict

Generator formula and tuple comprehension

It's easy to misunderstand from the syntax, but even if you enclose it in (), it will be a generator expression instead of a tuple comprehension.

comprehension_gen=(i%5 for i in range(10))
comprehension_gen
#>>> <generator object <genexpr> at 0x7f3000219678>

for i in comprehension_gen:print(i)
#>>> 0
#>>> 1
#>>> 2
#>>> 3
#>>> 4
#>>> 0
#>>> 1
#>>> 2
#>>> 3
#>>> 4

Rather than using tuple comprehension. Unlike the list, it does not store all the elements in memory, but generates the next element in order. The way to write without using the inclusion notation is as follows, but it is troublesome because you can not go unless you create a function that generates a generator once.

def gen_func():
    for i in range(10):
        yield i%5
extension_gen = gen_func()
extension_gen
#>>> <generator object gen_func at 0x7f3000166830>

Tuple comprehensions are rarely needed, but if you really need them, you can create them by passing the list comprehensions to the tuple function.

tuple([i for i in range(10)])
#>>> (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

[Addition] I will add the content of the comment. You can pass it to a generator, and the generator can omit the parentheses, so I think it's easy to read. (It will be written like a tuple inclusion notation.)

tuple(i for i in range(10))
#>>> (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

#Bracket abbreviations for the following syntax
tuple((i for i in range(10)))

Comparison with functional programming

Map, filter and comprehension notation have similar outputs and are often compared. Also, in python 2 series, map and filter return a list, so it corresponds to the list comprehension notation. In python 3 series, iterators called map and filter objects are returned, so it corresponds to the generator.

Rewriting with map

map(lambda i:i**2,range(1,11))
# python2.For x
#>>> [1, 4, 9, 16, 25, 36, 49, 64, 100]
# python3.In the case of x, iterator named map object is returned.
#>>> <map at 0x4570f98>

list(map(lambda i:i**2,range(1,11)))
#>>> [1, 4, 9, 16, 25, 36, 49, 64, 100]

#List comprehension
[i**2 for i in range(1,11)]
#>>> [1, 4, 9, 16, 25, 36, 49, 64, 100]

Rewriting with filter

filter(lambda i:i%2==1, range(1,11))
# python2.For x
#>>> [1, 3, 5, 7, 9]
# python3.In the case of x, iterator named filter object is returned.
#>>> <filter at 0x4578a20>

list(filter(lambda i:i%2==1, range(1,11)))
#>>> [1, 3, 5, 7, 9]
#List comprehension
[i for i in range(1,11) if i%2==1]
#>>> [1, 3, 5, 7, 9]

When nested

map(lambda j:j**2, filter(lambda i:i%2==1, range(1,11)))
# python2.For x
#>>> [1, 9, 25, 49, 81]
# python3.In the case of x, the filter object is returned.
#>>> <filter at 0x4578a20>

list(map(lambda j:j**2, filter(lambda i:i%2==1, range(1,11))))
#>>> [1, 9, 25, 49, 81]

#List comprehension
[i**2 for i in range(1,11) if i%2==1]
#>>> [1, 9, 25, 49, 81]

I find it easier to read list comprehensions, but is it a matter of familiarity? Also, list comprehensions are generally faster.

Road to darkness

** After reading this section, we recommend that you read the Readable Code to purify your mind. ** ** I feel that it is pythonic to not use multi-stage or conditional branching other than basic comprehension.

Connect multiple conditional operators

Connect multiple conditional operators Taking fizzbuzz as an example, it looks like this.

fizzbuzz=[]
for i in range(1,16):
    if i%15==0:
        fizzbuzz.append("fizzbuzz")
    elif i%3==0:
        fizzbuzz.append("fizz")
    elif i%5==0:
        fizzbuzz.append("buzz")
    else:
        fizzbuzz.append(i)
#>>> [1, 2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz']

["fizzbuzz" if i%15==0 else "fizz" if i%3==0 else "buzz"
 if i%5==0 else i for i in range(1,16)]
#>>> [1, 2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz']

Nested (multiple array)

I think there are quite a few cases where you want to nest arrays. You can do this by nesting the list comprehension in the list comprehension.

outer_list=[]
for i in range(3):
    innter_list=[]
    for j in range(10):
        innter_list.append(j)
    outer_list.append(innter_list)
outer_list
#>>> [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
#>>>  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
#>>>  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

#If you write in list comprehension
[[j for j in range(10)] for i in range(3)]
#>>> [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
#>>>  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
#>>>  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

When it comes to Mie, it becomes difficult to read even though it is python.

Double loop

Take flatten as an example. It's not that hard to read once you understand it, At first glance, readability is significantly reduced.

init=[[1,2,3],[4,5],[6,7]]
flatten=[]
for outer in init:
    for inner in outer:
        flatten.append(inner)
flatten
#>>> [1, 2, 3, 4, 5, 6, 7]

#Equivalent to this
[inner for outer in init for inner in outer]
#>>> [1, 2, 3, 4, 5, 6, 7]

Basically, read in order from the for clause on the left side, and the last thing to put in the list comes to mind. If you want to indent, it looks like this.

[
inner
for outer in init
    for inner in outer
]
#>>> [1, 2, 3, 4, 5, 6, 7]

[Indicated in the (Postfix if) section when including if](http://qiita.com/y__sama/items/a2c458de97c4aa5a98e7#if%E3%82%92%E5%90%AB%E3%82%80% Similar to E5% A0% B4% E5% 90% 88% E5% BE% 8C% E7% BD% AEif), it behaves like omitting the colon and indent for each for clause.

Patatoku Kashii

By combining zip and double loop, you can generate pattern cassis with one liner.

patato=[]
for i in zip("Police car","taxi"):
    for j in i:
        patato.append("".join(j))
"".join(patato)
#>>> 'Patatoku Kashii'

#If you write in inclusion notation
"".join(["".join(j) for i in zip("Police car","taxi") for j in i])
#>>> 'Patatoku Kashii'

#When indented
"".join(
  [
  "".join(j)
  for i in zip("Police car","taxi")
    for j in i
  ]
)
#>>> 'Patatoku Kashii'

You can also put in print.

[print(k) for i in zip("Police car","taxi") for j in i for k in j]
#>Pacific League
#>Ta
#>To
#>Ku
#>Mosquito
#>Shi
#>-
#>-
#>>> [None, None, None, None, None, None, None, None]
#Since there is no return value of the print function, it is an assortment of None.

Multiple loop matching technique with if clause

It's pretty good. If you add multiple arrays or lambda to this, it will be nice to read.

import re
DIO=["U","Useless","RR","Poor","Y"]

rslt=[]
for i in DIO:
    if re.match("[URY]+",i):
        for j in i:
            rslt.append(j*2)
"".join(rslt)
#>>> 'UURRRRYY'

#Comprehension notation
"".join([j*3 for i in DIO if re.match("[URY]+",i) for j in i])
#>>> 'UUURRRRRRYYY'

#When indented, it looks like this
"".join(
  [
    j*4
    for i in DIO
      if re.match("[URY]+",i)
        for j in i
  ]
)
#>>> 'UUUURRRRRRRRYYYY'

A little more detail on python comprehensions