This is a memo of O'Reilly Japan's book effective python. https://www.oreilly.co.jp/books/9784873117560/ P38~42
In a function that processes multiple times, when an iterator is used as an argument, it may behave unexpectedly.
** Consider a function that calculates the percentage of visitors in each city to account for the total number of visitors **
def normalize(numbers):
total = sum(numbers)
result = []
for value in numbers:
persent = 100 * value / total
result.append(persent)
return result
visits = [15, 35, 80]
percentages = normalize(visits)
print(percentages)
>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]
This is fine, but try using a generator in case the amount of data becomes large (the amount of data that causes a memory crash).
def read_visits(data_path):
with open(data_path) as f:
for line in f:
yield int(line)
it = read_visits("visits.txt") #Assuming a file with a large number of numbers
percentages = normalize(it)
print(percentages[:3])
>>>
[]
We expect to get results similar to the code above, but in this case an empty list is returned. The reason is that iterator results are only generated once.
If you express the flow as a flow
In this case, what is particularly confusing is that no exceptions are returned when the iterator runs out. In python iterative processing, it is not possible to determine whether there is no iterator output or whether it is already attached and StopIteration, so no exception is returned.
To solve this, copy the iterator to the list so that you can call it again and again.
def normalize_copy(numbers):
numbers = list(numbers) #Make a list of copies of the iterator here
total = sum(numbers)
result = []
for value in numbers:
persent = 100 * value / total
result.append(persent)
return result
it = read_visits("visits.txt")
persentage = normalize_copy(it)
print(persentage)
>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]
The result is as expected, but by generating a list of numbers in the normalize_copy function, a new memory area will be used. This eliminates the benefits of iterators. Instead of creating a list, consider matching it with a function that returns a new iterator.
Now define a function to pass a new iterator
def normalize_func(get_iter):
total = sum(get_iter()) #New iterator
result = []
for value in get_iter(): #New iterator
persent = 100 * value / total
result.append(persent)
return result
persentages = normalize_func(lambda: read_visits("visits.txt"))
print(list(persentages))
>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]
It works as expected, but using lambdas is a hassle. Implement the original ** Iterator Protocol ** to get the same result.
The iterator protocol is in charge of processing repeated calls in the container in a loop such as a for statement. (Call next () until StopIteration occurs) Let's create our own container class for this process
class ReadVisits(object):
def __init__(self, data_path):
self.data_path = data_path
def __iter__(self):
with open(self.data_path) as f:
for line in f:
yield int(line)
visits = ReadVisits("visits.txt")
percentages = normalize(visits)
print(percentages)
>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]
The difference from the original read_visits is that due to the newly implemented ReadVisit container class, it is processed twice in the normalize function. This is because each new iterator object is created. This allows visits to be called any number of times. (However, this also has a drawback: it involves reading the input data many times.)
Provide a function to check whether it is a container type to ensure correct processing.
def normalize_defensive(numbers):
if iter(numbers) is iter(numbers):
raise TypeError('Must supply a container')
total = sum(numbers)
result = []
for value in numbers:
percent = 100 * value / total
result.append(percent)
return result
visits = [15, 35, 80]
normalize_defensive(visits)
>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]
No error
visits = ReadVisits("visits.txt")
normalize_defensive(visits)
>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]
No error
An error will occur if the input is not container type
it = iter(visits)
normalize_defensive(it)
>>>
TypeError: Must supply a container
--In iterative processing such as loops, it may show unexpected behavior when iterators are used as arguments. --Container class can be implemented by implementing iterator protocol --You can call it twice and test if it is an iterator by seeing the same value.
Recommended Posts