Effective Python 2nd Edition-90 Items to Improve Python Programs is really wonderful, so I'm reading it while crying.
Among them, there was an item on how to deal with the missing key of dict, so I would like you to read this book for details, but I measured it because I was concerned about the processing time of each.
This time it's pretty easy, but I'll write a process to count the characters that appear in a certain string. The execution environment is the default for Google Colab.
First, import the required libraries.
import time, defaultdict
Target an appropriate character string for aggregation.
target = 'super_string_of_my_passages. but this does not make sense at all. because this is nothing'
Finally, the key-value sorted by the number of occurrences is output, and the expected result is as follows.
[('s', 13),
(' ', 12),
('e', 8),
('t', 7),
('a', 6),
('i', 5),
('n', 5),
('_', 4),
('o', 4),
('u', 3),
('g', 3),
('h', 3),
('p', 2),
('r', 2),
('m', 2),
('.', 2),
('b', 2),
('l', 2),
('f', 1),
('y', 1),
('d', 1),
('k', 1),
('c', 1)]
Check if the key exists in the if statement, and give the initial value to the missing key by using the in expression that returns True. Probably the first simple way to come up with.
%%time
ranking = {}
for key in target:
if key in ranking.keys():
count = ranking[key]
else:
count = 0
ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)
CPU times: user 45 µs, sys: 9 µs, total: 54 µs Wall time: 56.3 µs
Use the try-except statement to handle the KeyError that is the cause of the error as the expected error.
%%time
ranking = {}
for key in target:
try:
count = ranking[key]
except KeyError:
count = 0
ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)
CPU times: user 59 µs, sys: 11 µs, total: 70 µs Wall time: 78.2 µs
Use the get method provided by the built-in dict.
%%time
ranking = {}
for key in target:
count = ranking.get(key, 0)
ranking[key] = count + 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)
CPU times: user 43 µs, sys: 8 µs, total: 51 µs Wall time: 53.6 µs
%%time
ranking = defaultdict(int)
for s in target:
ranking[s] += 1
sorted(ranking.items(), key=lambda x: x[1], reverse=True)
CPU times: user 36 µs, sys: 8 µs, total: 44 µs Wall time: 47.2 µs
The defaultdict may be good! (* ^^)
That's all for the methods covered in this book, Hey, if you do this kind of processing, you can use him! Can't you forget it? I will write it as an extra edition because it seems to be thrust into. In such a simple case, you can use the Counter class of the collections library. It is a convenient one that counts the number of occurrences in each element. Since there is a most_common method that sorts by the number, use it.
from collections import Counter
%%time
ranking = Counter(target)
ranking.most_common()
CPU times: user 53 µs, sys: 0 ns, total: 53 µs Wall time: 56.5 µs
Thank you very much!
Recommended Posts