Python is popular, isn't it?
One of the strengths of Python is list
. It's a generous list
that doesn't choose the type of object you put inside, but as the title suggests, it's not exactly a ** list **.
This time, I will explain about that with the CPython source code.
[List](https://ja.wikipedia.org/wiki/%E3%83%AA%E3%82%B9%E3%83%88_(%E6%8A%BD%E8%B1%A1%E3%] 83% 87% E3% 83% BC% E3% 82% BF% E5% 9E% 8B))) is an "ordered data container". In short, it's a collection of data in order.
I think that many people who read this article are Pythonista, so I would like to implement it in Python, but since Python can not implement a strict list, I will write a minimal list program in C ++. I don't want to talk about C ++, so I don't use templates and make a list of ints.
class List {
private:
int base; //Hold value
int *next; //Address of the following value
public:
/*
*append and various methods
*/
}
Only this. It's easy?
This is a ** unidirectional list **, and as the name implies, it is a list whose references are ** unidirectional **.
On the other hand, ** bidirectional list ** is a list that has the address of the previous value and can be referred to backwards.
If you want to refer to the next value, just refer to the data at the next
position. In other words, it may be a little different from the list you imagine, but ** the data is not in a row in memory **.
I hope you found the list in the previous chapter. So what is an array (https://ja.wikipedia.org/wiki/%E9%85%8D%E5%88%97)? The simple answer is ** just a data string **. I will explain it because it is a little misleading and not accurate.
As explained in the previous chapter, the list was ** a column of data holding the address of the next data **. Arrays, on the other hand, are ** data arranged in memory **.
Let me give you a simple example.
Suppose you have an array of alphabets starting with data ʻAat address 0 in memory. (Data size is 1 for simplicity) Then there should be
B next to address 1 in memory, that is, ʻA
. Next to it is C
, next to it is D
, and so on, the array is ** all data in a straight line in memory without exception **.
This is the difference between a list and an array.
And from here let's look at Python's list
implementation.
As you know, CPython is the official processing system of Python implemented in C language. The source code is currently (2020/04/10) latest. (But the built-in source code doesn't change that often, so it doesn't have to be tightly matched.)
Looking at the CPython source code, I think that many people are overwhelmed by the various directories. The basic object definition is in ʻInclude / cpython / xxxobject.h`, so let's take a look there. Then, I think there is such a definition.
Include/cpython/listobject.h
typedef struct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 <= ob_size <= allocated
* len(list) == ob_size
* ob_item == NULL implies ob_size == allocated == 0
* list.sort() temporarily sets allocated to -1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
*/
Py_ssize_t allocated;
} PyListObject;
You need to know PyObject to read the Python source code. A PyObject is the underlying object for all objects, and all Python objects are provided as extensions to this object. So this code
PyListObject
with the array of addresses to PyObject ʻob_item and the acquired memory size ʻallocated
*
It will be."Hmm? I'm saying a list, but I don't have the address for the next element!" That's right.
** Python lists are arrays. ** ** Say again ** Python lists are arrays. ** **
"But what makes arrays and lists so happy and sad after all?" So, from here, I will explain not the technical difference but how it will change concretely.
Random access, as the name implies, means accessing random locations in a data structure. Here, we will introduce the difference in random access between arrays and lists.
If you want to do random access on an array, just add the position from the beginning of the element you want to access to the beginning address.
In the previous alphabet example, there is ʻA at address 0, so to access the 6th
G, just add
6 to the position of ʻA
, that is, 0
.
If the name of the alphabet array is alphabet, such access can be written in C language as follows.
char *alphabet = {'A', 'B', 'C', ... , 'X', 'Y', 'Z'};
char a = *(alphabet + 0); //A position is 0
char g = *(alphabet + 6); //G position is 6
And this can also be written in C language.
char a = alphabet[0];
char g = alphabet[6];
Isn't it a familiar shape? Yes, it's the same subscript access as Python.
The form name [i]
means ** get the element ahead of ** name
ʻi` **.
What is inside Python doing during random access? Let's take a look at the CPython source code again. Let's take a look at Include / cpython / listobject.h, which is the same as before.
Include/cpython/listobject.h
/* Cast argument to PyTupleObject* type. */
#define _PyList_CAST(op) (assert(PyList_Check(op)), (PyListObject *)(op))
#define PyList_GET_ITEM(op, i) (_PyList_CAST(op)->ob_item[i])
#define PyList_SET_ITEM(op, i, v) (_PyList_CAST(op)->ob_item[i] = (v))
#define PyList_GET_SIZE(op) Py_SIZE(_PyList_CAST(op))
#define _PyList_ITEMS(op) (_PyList_CAST(op)->ob_item)
On the 4th line, a macro called PyList_GET_ITEM
is defined.
Looking at the contents, (_PyList_CAST (op)-> ob_item [i])
was! Subscript access! It turns out that it is also accessed by subscripts inside Python.
By the way, one line below that is a macro that assigns a value to the specified position, but it is also accessed by subscripts here as well.
So what happens when you randomly access a list?
The list does not have elements in a row, so you need to follow the addresses of the next elements one by one. Repeat 6 times for B
after ʻA,
Cafter
B, and so on, and you will reach
G`. It's less efficient than an array by any means.
As you can see, ** lists are not good at random access, arrays are good ** data structures, and arrays are adopted in Python. (I don't know why I named it list)
I hope you've seen that the internal implementation of Python's list
is actually an array, and that's what makes it grateful.
If you have any questions, questions, or suggestions for mistakes, please feel free to comment. If you have any questions, you can come to Twitter.
Thank you very much
Recommended Posts