This is a memo of O'Reilly Japan's book effective python. https://www.oreilly.co.jp/books/9784873117560/ P38~42

Note that iterator calls are stateful

In a function that processes multiple times, when an iterator is used as an argument, it may behave unexpectedly.

** Consider a function that calculates the percentage of visitors in each city to account for the total number of visitors **

def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        persent = 100 * value / total
        result.append(persent)
    return result

visits = [15, 35, 80]
percentages = normalize(visits)
print(percentages)

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

This is fine, but try using a generator in case the amount of data becomes large (the amount of data that causes a memory crash).

def read_visits(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)

it = read_visits("visits.txt") #Assuming a file with a large number of numbers
percentages = normalize(it)
print(percentages[:3])

>>>
[]

We expect to get results similar to the code above, but in this case an empty list is returned. The reason is that iterator results are only generated once.

If you express the flow as a flow

First, read_visits () creates it, which is a generator.
normalize () takes a generator it as an argument, and sum () is calculated first.
** At this point, the generator it count is already exhausted! ** **
After that, for loop processing is performed in normalize, but ** for statement does not work because the generator it count has already run out **
result remains empty, unchanged from declaration

In this case, what is particularly confusing is that no exceptions are returned when the iterator runs out. In python iterative processing, it is not possible to determine whether there is no iterator output or whether it is already attached and StopIteration, so no exception is returned.

To solve this, copy the iterator to the list so that you can call it again and again.

def normalize_copy(numbers):
    numbers = list(numbers) #Make a list of copies of the iterator here
    total = sum(numbers)
    result = []
    for value in numbers:
        persent = 100 * value / total
        result.append(persent)
    return result

it = read_visits("visits.txt")
persentage = normalize_copy(it)
print(persentage)

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

The result is as expected, but by generating a list of numbers in the normalize_copy function, a new memory area will be used. This eliminates the benefits of iterators. Instead of creating a list, consider matching it with a function that returns a new iterator.

Now define a function to pass a new iterator

def normalize_func(get_iter):
    total = sum(get_iter()) #New iterator
    result = []
    for value in get_iter():　　#New iterator
        persent = 100 * value / total
        result.append(persent)
    return result

persentages = normalize_func(lambda: read_visits("visits.txt"))
print(list(persentages))

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

It works as expected, but using lambdas is a hassle. Implement the original ** Iterator Protocol ** to get the same result.

The iterator protocol is in charge of processing repeated calls in the container in a loop such as a for statement. (Call next () until StopIteration occurs) Let's create our own container class for this process

class ReadVisits(object):
    def __init__(self, data_path):
        self.data_path = data_path
        
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

visits = ReadVisits("visits.txt")
percentages = normalize(visits)
print(percentages)

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

The difference from the original read_visits is that due to the newly implemented ReadVisit container class, it is processed twice in the normalize function. This is because each new iterator object is created. This allows visits to be called any number of times. (However, this also has a drawback: it involves reading the input data many times.)

Provide a function to check whether it is a container type to ensure correct processing.

def normalize_defensive(numbers):
    if iter(numbers) is iter(numbers):
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result


visits = [15, 35, 80]
normalize_defensive(visits)

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

No error

visits = ReadVisits("visits.txt")
normalize_defensive(visits)

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

No error

An error will occur if the input is not container type

it = iter(visits)
normalize_defensive(it)

>>>
TypeError: Must supply a container

Summary

--In iterative processing such as loops, it may show unexpected behavior when iterators are used as arguments. --Container class can be implemented by implementing iterator protocol --You can call it twice and test if it is an iterator by seeing the same value.

Effective Python Note Item 17 Respect for certainty when using iterators for arguments

Note that iterator calls are stateful

Summary