This is a memo of O'Reilly Japan's book effective python. https://www.oreilly.co.jp/books/9784873117560/ P35~37
** The list is the simplest if you want to return the results in a sequence **
Consider the case of checking the position of whitespace characters in a sentence
def index_words(text):
result = []
if text:
result.append(0)
for index, letter in enumerate(text):
if letter == ' ':
result.append(index + 1)
return result
address = 'Four score and secer years ago...'
result = index_words(address)
print(result[:3])
>>>
[0, 5, 11]
The operation itself is normal, but there are two problems
There is a point that it is difficult to read as a whole by adding it with append many times in the function definition. It is convenient to use a generator in such cases
def index_words_iter(text):
if text:
yield 0
for index, letter in enumerate(text):
if letter == " ":
yield index + 1
result = list(index_words_iter(address))
print(result[:3])
>>>
[0, 5, 11]
It returns an iterator each time by using yield. You can also easily generate a list by passing an iterator to list ().
Creating a list of results in the index_words function means that memory will be consumed accordingly. For large data processing, there is a risk of a crash there.
In that respect, the generator outputs one piece of data each time, so it can handle any length. ** Minimize memory consumption ** A generator that reads data from a file and processes it one by one
def index_file(handle):
offset = 0
for line in handle:
if line:
yield offset
for letter in line:
offset += 1
if letter ==" ":
yield offset
from itertools import islice
with open("address.txt", "r") as f:
it = index_file(f)
results = islice(it, 0, 3)
print(list(results))
>>>
[0, 5, 11]
Now you can handle sentences of any length and you don't have to worry about memory crashes. However, you need to be aware that due to the nature of iterators and generators, the content changes each time you call it (stateful).
Recommended Posts