This article is the 14th day article of NCC Advent Calender 2019.
Arrays are important when writing programs.
However, there is a ** too many Python array-like problem **.
Even with standard Python features
list
,dict
,set
,tuple
there is.
In addition, when you start using libraries such as numpy
, you will find more similar ones.
So, this time, I will explain them so that they can be used properly in the following four points.
Since this is an article for ** proper use **, I will not touch on detailed usage. Also, I will write it on the assumption that I somehow know what the array is. I will explain using simple words as much as possible.
--I started using Python, but I'm in trouble because there are too many arrays. ――I was working in another language, but I don't know where each array stands in Python. --People who have decided which one to use during implementation --People who started using libraries such as numpy
list
dict
set
tuple
numpy
numpy.array
/numpy.ndarray
numpy.matrix
list
The list
is simply ** the most basic one **.
It is the same as a general array in other languages, and in Python it is generally called a "list".
The symbol is []
, so if it is surrounded by this and there are a lot of ,
inside, think of it as a list.
In Python, there can be multiple types in the list. You can also enter as many same values as you like. In addition, it has an array function that is generally called.
Later, when I explain the following dict
, I think that I often compare it with list
, so I think it would be nice if you could understand the nature of list
while looking at it.
(Because it is a normal guy, there is not much explanation.)
--Easy to handle
――You can implement it by just using list
without thinking about difficult things.
――It becomes difficult to understand as the number of dimensions increases
--I can't see the length of each array at once (there is no such thing as shape
in numpy
)
--Only numbers with references starting from 0
The origin, good or bad.
Even with numpy
or pandas
, you can convert it to list
and then use it.
It is also list
when the length is indefinite in one dimension, or when the order has meaning but the number has no meaning.
However, it is not very suitable for multidimensional.
(Don't design too many dimensions in the first place)
If it is a numerical value, it will be refreshing if you combine it with numpy
, and if it is complex data, you can combine it with dict
.
dict
It's a so-called ** associative array **.
In Python it is a dictionary or dictionary
(dict
for short).
The symbol is {}
, and each element is connected by :
like {key: value}
.
I will not explain in detail with examples. Please refer to the reference etc.
I think the key
type of dict
could be anything, but it's easier to understand if you use a string or an integer.
Also, when the contents are full, it is easier to see if you start a new line with key
. Also, let's align the indentation. (like json)
Here is an example.
ncc = { 'name': 'ncc',
'full name': 'nakano computer club',
'estimate': 2015,
'web site': 'https://meiji-ncc.tech/'
}
--Easy to understand --Easy to use with json --A character string can be used as a reference
--If you are not used to it, it will cause an error
――It is difficult to take out when it gets deep
--It becomes like dic ['first'] ['second'] ['third']
It's up to you when the value and its name are important!
Dict
is easier to handle when the intervals are irregular even with numbers.
Also, before converting to DataFrame
of pandas
(library), it is often summarized once with dict
. Because it is easier to handle.
I think you should use it in the image that connects key
and value
.
It's convenient to deal with kettles and json. .. (Easy to read and write using json library)
set
The simple way to express set
is ** no cover list
**.
The symbol is {}
, which is the same as dict
, but does not use:
.
As with list
, the values are arranged by,
.
So, if it is surrounded by {}
and there are a lot of ,
, it is set
.
Strictly speaking, it represents a set.
So you can also do set operations. (I won't touch it here)
list
can have the same value as[0, 1, 2, 1, 0]
, but not with set
.
Converting the above list
to set
gives {0, 1, 2}
.
This is the same except for numbers.
To put it the other way around, if you want to eliminate the cover, you can convert it to set
.
In this case, if you want to treat it as a list again, you need to convert it from the top of set
to list
and return it to the list.
I will write an example.
list_duplicate = [0, 1, 2, 2, 1, 0, 3]
list_non_duplicate = list(set(list_duplicate))
print(list_non_duplicate) # out: [0, 1, 2, 3]
--Can be put in without covering --A set operation is possible
--Loose in order of elements
-(If you convert from list
, the cover will be removed and it will be packed before that)
--The symbol is difficult to understand with dict
(slightly)
It is rarely used in set
from the beginning.
It is often converted from the list when you want to eliminate the cover or when you want to extract the intersection of the elements of multiple lists.
Therefore, you should think that it is used when taking a collective approach at the time of implementation.
tuple
Should tuple
be called ** a little stiff list
**?
There is a little habit.
The symbol is ()
.
Basically it's like list
, but it's different.
Roughly speaking, ** I can't mess with what I've made **.
You can add another tuple after it. (1)
You can also change the tuple itself into something else entirely. (2)
In addition, elements cannot be rewritten.
It's a little difficult, so I'll show you an example.
t = (0, 1, 2)
#Add tuples behind
t += (3, 4) # OK(1)
#Rewriting the tuple itself
t = (0, 1, 2) # OK(2)
#Rewriting elements
t[0] = 1 # Error
There are no methods for assigning or deleting.
Also, since the elements cannot be rewritten, the specific order cannot be changed.
To do this, you need to convert it to list
.
--Once made, it cannot be rewritten --The order is always guaranteed at the time of creation --Behavior can be fixed --Can be used as a constant
--No flexibility --Difficult to handle --Source of error
Since Python does not have a type that represents a constant (const in js), you can do it with tuple
in a pseudo manner.
However, I don't use it much because I can't do anything dynamic.
The return value of the library method may be tuple
, so use it there.
Next, I will move on to the explanation of the array system in the Python library numpy
.
Before that, let's take a quick look at numpy
.
numpy
?numpy
is a library that can perform matrix operations performed by linear algebra.
Addition and subtraction of array elements.
It can be used when multiplying the entire array by a numerical value.
You can do more advanced things, but it's okay if you think that ** numerical calculation of arrays will be convenient **.
numpy.array
/numpy.ndarray
The one-dimensional array of numpy
is numpy.array
.
Multidimensional is numpy.ndarray
.
The treatment does not change much whether it is one-dimensional or multidimensional.
Since this is a library, it cannot be represented by a specific symbol.
If you enclose list
etc. innumpy.array ()
, it will be converted.
numpy.array ()
even in multiple dimensions, not in numpy.ndarray
.
The difference from list
is that ** arrays can be operated on **.
However, you cannot create an empty numpy.array
.
Let's convert from list
.import numpy as np
#Define a regular list
list_num0 = [0, 1, 2, 3, 4]
#Convert to numpy array
np_num0 = np.array(list_num0)
print(np_num0) # out: [0 1 2 3 4]
#Generate numpy array directly
np_num1 = np.array([5, 6, 7, 8, 9])
print(np_num1) # out: [5 6 7 8 9]
#Convert numpy array to list
list_num1 = list(np_num1)
print(list_num1) # out: [5, 6, 7, 8, 9]
#Try doubling each of the list and numpy arrays
list_num0_twice = 2*list_num0
print(list_num0_twice) # out: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
np_num0_twice = 2*np_num0
print(np_num0_twice) # out: [0 2 4 6 8]
# list,Try adding each with a numpy array
list_num_add = list_num0 + list_num1
print(list_num_add) # out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
np_num_add = np_num0 + np_num1
print(np_num_add) # out: [ 5 7 9 11 13]
In this way, you can easily perform operations within the elements of an array with numpy.array
.
--Easy operation between array elements --Easy to handle even in multiple dimensions --Calculation is fast (often)
――It's difficult to handle unless you get used to it --Not suitable for multidimensional arrays with different lengths --Difficult to use other than numerical values
If you want to do mathematical things, it's definitely numpy
.
It goes well with a library called scipy
that can perform applied calculations (integral, etc.).
If you get used to doing a lot of calculations, you should try it.
It is easy to use if you understand the difference from list
.
numpy.matrix
--I have never used it
――I checked various things, but it seems that you should use ndarray
--It seems convenient when using an m × n matrix
Here is a summary of each in one word.
list
: Basic
dict
: Strong name-value ties
set
: uncovered list
tuple
: Non-rewritable list
numpy.array
/ numpy.ndarray
: Numerical calculation specialization list
Don't use it because it's a mess with numpy.matrix
: numpy.ndarray
Here is a diagram showing somehow the flow when deciding which one to use personally.
Actually, it's a little more complicated, but until you get used to it, you should think like this.
Since tuple
is not used, it is not included.
(Actually, I don't use set
too much)
This time I compared the Python array-like guys. There are more if you include the details and libraries. However, since this area is the basis, if you can understand this area, I think that other understandings will improve.
Recommended Posts