The standard library of Python is very powerful, but there are too many libraries to grasp, and there are many people who know but forget their existence and reinvent the wheel. At least I'm one of those people, so I'll introduce some data types included in the Python standard library that are useful but not used unless you are aware of them, as a memo for yourself.
DefaultDict Official document: https://docs.python.jp/3/library/collections.html#collections.defaultdict
Literally, a dictionary type that allows you to set default values. The nice thing about this is that you don't have to check each key to see if it's in the dictionary. For example, when counting the number of occurrences of a word, you can use it in the following form:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> string = "python is way way way way better than java"
>>> for w in string.split(" "):
... d[w] += 1
...
>>> d.items()
dict_items([('better', 1), ('than', 1), ('python', 1), ('java', 1), ('way', 4), ('is', 1)])
By the way, it's hard to understand at first glance, but the constructor of defaultdict takes a function that generates a value (to be exact, a callable object) as an argument instead of the default value. So, if you do the following, you will get an error.
>>> from collections import defaultdict
>>> d = defaultdict(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: first argument must be callable or None
Correctly,
>>> d = defaultdict(lambda: 0)
Or
>>> d = defaultdict(int)
(Anything that returns 0 when called).
Counter Official document: https://docs.python.jp/3/library/collections.html#collections.Counter
If you just want to count the words, the Counter class is overwhelmingly convenient. If you pass a list, the number of elements will be counted, and if you pass a character string, the number of characters will be counted for each element / character.
>>> from collections import Counter
>>> c = Counter("python is way way way way better than C".split(" ")
>>> c
Counter({'way': 3, 'is': 1, 'better': 1, 'python': 1, 'C': 1, 'than': 1})
>>> c.most_common(1)
[('way', 3)]
It's a process that you might end up implementing yourself, but since there is such a convenient data type, let's use it. As you can see, you can easily get the mode and the most frequently occurring n elements. You can also add or subtract between Counters, so Counters are also useful when you want to compare sentences.
deque Official document: https://docs.python.jp/3/library/collections.html#collections.deque
In python, the built-in list type already has a pop method, so it's easy to overlook it, but you can retrieve and delete ** elements with ** O (1) from both the beginning and the end. There is a data type called deque. By the way, the list type is an O (n) operation because deleting data from the beginning causes movement of elements. Also, deque takes a parameter called maxlen at initialization, in which case if you try to add an element greater than maxlen, it will automatically remove it from the first element. Deques come into play when you need a data structure that changes in length dynamically and from both directions. For example, when you want to manage a history of a fixed length.
>>> from collections import deque
>>> history = deque(maxlen=100)
>>> lines = open("python.txt")
>>> for line in lines:
... if 'python' in line:
... print(lines)
... history.append(line)
Even better, deques are ** thread safe **. It can also be used as a means of data sharing in a system where producers and consumers are in multiple threads.
PriorityQueue Official document: https://docs.python.jp/3/library/queue.html#queue.PriorityQueue
I didn't know until recently, but Python implements Priority Queue in the standard library. I've implemented it using heapq before, but you didn't have to do that. PriorityQueue is useful when implementing search algorithms. Breadth-first search, depth-first search, and A * search can also be regarded as the same algorithm except that the priority of PriorityQueue is different. By the way, Priority Queue is also thread safe.
OrderedDict Official document: https://docs.python.jp/3/library/collections.html#collections.OrderedDict
Python dictionary elements are basically out of order. Therefore, when using elements dynamically, the order in which they are returned when the elements of the dictionary are accessed sequentially is indefinite. In addition, trying to sort the elements of a dictionary can be a hassle (or rather, the dictionary itself cannot be sorted). OrderedDict makes it easy to do these things. For example, consider a situation where you manage your test scores in a dictionary and display the scores in different orders:
>>> from collections import OrderedDict
>>> d = OrderedDict({"Suzuki": 100, "Tanaka": 30, "Sato": 50})
>>> sorted(d.items(), key=lambda x: x[1])
[('Tanaka', 30), ('Sato', 50), ('Suzuki', 100)]
>>> sorted(d.items(), key=lambda x: x[0])
[('Sato', 50), ('Suzuki', 100), ('Tanaka', 30)]
As in the example above, it's easy to sort by score or by name.
This time I've focused on five data types that I find particularly useful, but Python has a variety of useful standard libraries that I recommend reading (I'm still new). You may discover that).
Recommended Posts