I use Qiita a lot, but this is my first post! Nice to meet you!
There are many useful articles about Python, but I have the impression that there aren't many articles that touch on the internal implementation of Python, so I'm motivated to be able to explain various data structures in conjunction with the internal implementation. This time I will write about Python list.
This is an article about how Python lists work. But it's impossible to write how all the methods in the list work, so mainly
――What kind of data structure is a list? Isn't it an array? --What is the internal implementation of the list type? --What is a variable length array? What are the rules for changing the size?
I wrote an article that can solve such questions.
--People who have the above questions --People who want to know a little more by reading Python introductory books and tutorials --People who want to know more but are tired of reading official documentation or cpython code
Let's see what kind of data structure the Python list type is.
What is a list?
>>> x = [1,2,3]
>>> x.append(4)
>>> x
[1, 2, 3, 4]
This is the familiar one.
x.sort()
x.append(element)
x.clear()
x.extend(element)
x.index(element)
x.insert(element)
x.pop()
x.remove(element)
x.reverse()
There are many methods like this (miscellaneous).
It's a basic story, so if you know it, please skip it.
The array is
--Secure a continuous area in memory and put elements in it --It takes O (1) to access the element and O (N) to insert or delete the element.
Linked list (unidirectional)
--One node has an element and a pointer to the next element --It takes O (N) to access the element and O (1) to insert or delete the element.
There is a feature.
list is a standard data structure and is one of the sequence types. Since it is named a list, some people may think that it is implemented using a linked list, but Python lists are implemented as variable length arrays (dynamic arrays). So ** Python list is an array **. The name is confusing. ..
A list is a contiguous array with references to other objects. The structure at the top of the list (PyListObject
) has a pointer and length to this array.
Looking at the actual cpython code (ʻInclude / cpython / listobject.h`) (comment omitted)
listobject.h
typedef struct {
PyObject_VAR_HEAD
PyObject **ob_item;
Py_ssize_t allocated;
} PyListObject;
It is defined like this.
The list is of type PyListObject
and the elements in the list are of type PyObject
and are represented internally.
(PyObject
is the base type for all Python object types.)
ʻOb_item is an array of pointers to the elements in the list, and ʻallocated
is the allocated size.
Elements can have different data types in the same list, as long as they are PyObject
.
x = [1,"a",[1,2,3]]
I talked about Python lists being variable length arrays. Variable length arrays resize the referenced array as elements are added or removed. However, it does not change the size of the array every time. It feels good to decide when to increase the size and its size.
The process when changing the size of the array to add a new element is as follows.
If there is no free space, reserve a new space, copy all the current elements, and add a new element.
growth factor How much to grow when the array is full depends on the ** growth factor ** (about how many times the size of the existing array is multiplied). ** growth factor ** depends on the language. (For example Python is 1.125, C is 2)
For example, when the growth factor is 2, the size (capacity) will be as follows when elements are added to the array in order.
When deleting an element, the reduction is similar to the enlargement.
Since the operation to expand the array copies all the elements, the amount of calculation is O (k) when the current number of elements is k.
Now consider that the complexity of list.append (element)
is O (1).
Conclusion Considering the amount of calculation, this is O (1).
When adding n elements to an empty array in order, if the growth factor is 2, the amount of calculation will be
\begin{align}
O(n+2+2^2+2^3+\cdots+2^{logn}) \\
= O(n+2\times2^{logn})\\
= O(n)
\end{align}
Therefore, the amount of calculation when adding n elements is O (n).
Therefore, the complexity of list.append (element)
is O (1).
Python's growth factor is 1.125, but let's see how to grow it concretely. Operations related to list are described in ʻObjects / listobject.c`. The important part of the function to be resized is as follows.
listobject.c
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* Add padding to make the allocated size multiple of 4.
* The growth pattern is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...
* Note: new_allocated won't overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
}
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
It's hard to understand the mask calculation in the latter half, isn't it?
In short, multiply the current size by $ \ frac {9} {8} $ and add a little. Certainly the growth factor is 1.125.
--Python list is an array --list is internally of type PyListObject and has pointers to PyObjects. --Python growth factor is 1.125
--cpython git repository -python Japanese documentation -Expert Python Programming
Recommended Posts