This article is the 14th day article of NCC Advent Calender 2019.

Introduction

Arrays are important when writing programs. However, there is a ** too many Python array-like problem **. Even with standard Python features list，dict，set，tuple there is. In addition, when you start using libraries such as numpy, you will find more similar ones.

So, this time, I will explain them so that they can be used properly in the following four points.

Overview
Good points
Bad place
Impressions

Since this is an article for ** proper use **, I will not touch on detailed usage. Also, I will write it on the assumption that I somehow know what the array is. I will explain using simple words as much as possible.

For ease of explanation, the explanation may differ from the essence.

Target audience

--I started using Python, but I'm in trouble because there are too many arrays. ――I was working in another language, but I don't know where each array stands in Python. --People who have decided which one to use during implementation --People who started using libraries such as numpy

What to pick up

Python standard features

list
dict
set
tuple

Library

numpy

numpy.array/numpy.ndarray
numpy.matrix

Main story

list

Overview

The list is simply ** the most basic one **. It is the same as a general array in other languages, and in Python it is generally called a "list". The symbol is [], so if it is surrounded by this and there are a lot of , inside, think of it as a list.

In Python, there can be multiple types in the list. You can also enter as many same values as you like. In addition, it has an array function that is generally called.

Later, when I explain the following dict, I think that I often compare it with list, so I think it would be nice if you could understand the nature of list while looking at it. (Because it is a normal guy, there is not much explanation.)

Good point

--Easy to handle ――You can implement it by just using list without thinking about difficult things.

Bad cousin

――It becomes difficult to understand as the number of dimensions increases --I can't see the length of each array at once (there is no such thing as shape in numpy) --Only numbers with references starting from 0

Impressions

The origin, good or bad. Even with numpy or pandas, you can convert it to list and then use it. It is also list when the length is indefinite in one dimension, or when the order has meaning but the number has no meaning. However, it is not very suitable for multidimensional. (Don't design too many dimensions in the first place) If it is a numerical value, it will be refreshing if you combine it with numpy, and if it is complex data, you can combine it with dict.

dict

Overview

It's a so-called ** associative array **. In Python it is a dictionary or dictionary (dict for short). The symbol is {}, and each element is connected by : like {key: value}. I will not explain in detail with examples. Please refer to the reference etc.

I think the key type of dict could be anything, but it's easier to understand if you use a string or an integer. Also, when the contents are full, it is easier to see if you start a new line with key. Also, let's align the indentation. (like json) Here is an example.


ncc = { 'name': 'ncc',
        'full name': 'nakano computer club',
        'estimate': 2015,
        'web site': 'https://meiji-ncc.tech/'
        }

There are several denominations for indentation, so use the one you like. Also, if each element is too long, it may be easier to see if it is set to 3 each. Let's respond flexibly to this area according to the content.

Good point

--Easy to understand --Easy to use with json --A character string can be used as a reference

Bad cousin

--If you are not used to it, it will cause an error ――It is difficult to take out when it gets deep --It becomes like dic ['first'] ['second'] ['third']

Impressions

It's up to you when the value and its name are important! Dict is easier to handle when the intervals are irregular even with numbers. Also, before converting to DataFrame of pandas (library), it is often summarized once with dict. Because it is easier to handle. I think you should use it in the image that connects key and value. It's convenient to deal with kettles and json. .. (Easy to read and write using json library)

set

Overview

The simple way to express set is ** no cover list **. The symbol is {}, which is the same as dict, but does not use: . As with list, the values are arranged by, . So, if it is surrounded by {} and there are a lot of ,, it is set. Strictly speaking, it represents a set. So you can also do set operations. (I won't touch it here)

list can have the same value as[0, 1, 2, 1, 0], but not with set. Converting the above list to set gives {0, 1, 2}. This is the same except for numbers.

To put it the other way around, if you want to eliminate the cover, you can convert it to set. In this case, if you want to treat it as a list again, you need to convert it from the top of set to list and return it to the list. I will write an example.

list_duplicate = [0, 1, 2, 2, 1, 0, 3]
list_non_duplicate = list(set(list_duplicate))
print(list_non_duplicate) # out: [0, 1, 2, 3]

Good point

--Can be put in without covering --A set operation is possible

Bad cousin

--Loose in order of elements -(If you convert from list, the cover will be removed and it will be packed before that) --The symbol is difficult to understand with dict (slightly)

Impressions

It is rarely used in set from the beginning. It is often converted from the list when you want to eliminate the cover or when you want to extract the intersection of the elements of multiple lists. Therefore, you should think that it is used when taking a collective approach at the time of implementation.

tuple

Overview

Should tuple be called ** a little stiff list **? There is a little habit. The symbol is ().

Basically it's like list, but it's different. Roughly speaking, ** I can't mess with what I've made **. You can add another tuple after it. (1) You can also change the tuple itself into something else entirely. (2) In addition, elements cannot be rewritten. It's a little difficult, so I'll show you an example.

t = (0, 1, 2)
#Add tuples behind
t += (3, 4) # OK(1)
#Rewriting the tuple itself
t = (0, 1, 2) # OK(2)
#Rewriting elements
t[0] = 1 # Error

There are no methods for assigning or deleting. Also, since the elements cannot be rewritten, the specific order cannot be changed. To do this, you need to convert it to list.

Good point

--Once made, it cannot be rewritten --The order is always guaranteed at the time of creation --Behavior can be fixed --Can be used as a constant

Bad cousin

--No flexibility --Difficult to handle --Source of error

Impressions

Since Python does not have a type that represents a constant (const in js), you can do it with tuple in a pseudo manner. However, I don't use it much because I can't do anything dynamic. The return value of the library method may be tuple, so use it there.

Next, I will move on to the explanation of the array system in the Python library numpy. Before that, let's take a quick look at numpy.

(Supplement) What is `numpy`?

numpy is a library that can perform matrix operations performed by linear algebra. Addition and subtraction of array elements. It can be used when multiplying the entire array by a numerical value. You can do more advanced things, but it's okay if you think that ** numerical calculation of arrays will be convenient **.

numpy.array/numpy.ndarray

Overview

The one-dimensional array of numpy is numpy.array. Multidimensional is numpy.ndarray. The treatment does not change much whether it is one-dimensional or multidimensional. Since this is a library, it cannot be represented by a specific symbol. If you enclose list etc. innumpy.array (), it will be converted.

Enclose in numpy.array () even in multiple dimensions, not in numpy.ndarray. The difference from list is that ** arrays can be operated on **. However, you cannot create an empty numpy.array. Let's convert from list.

import numpy as np

#Define a regular list
list_num0 = [0, 1, 2, 3, 4]

#Convert to numpy array
np_num0 = np.array(list_num0)
print(np_num0) # out: [0 1 2 3 4]

#Generate numpy array directly
np_num1 = np.array([5, 6, 7, 8, 9])
print(np_num1) # out: [5 6 7 8 9]

#Convert numpy array to list
list_num1 = list(np_num1) 
print(list_num1) # out: [5, 6, 7, 8, 9]

#Try doubling each of the list and numpy arrays
list_num0_twice = 2*list_num0
print(list_num0_twice) # out: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
np_num0_twice = 2*np_num0
print(np_num0_twice) # out: [0 2 4 6 8]

# list,Try adding each with a numpy array
list_num_add = list_num0 + list_num1
print(list_num_add) # out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
np_num_add = np_num0 + np_num1
print(np_num_add) # out: [ 5  7  9 11 13]

In this way, you can easily perform operations within the elements of an array with numpy.array.

Good point

--Easy operation between array elements --Easy to handle even in multiple dimensions --Calculation is fast (often)

Bad cousin

――It's difficult to handle unless you get used to it --Not suitable for multidimensional arrays with different lengths --Difficult to use other than numerical values

Impressions

If you want to do mathematical things, it's definitely numpy. It goes well with a library called scipy that can perform applied calculations (integral, etc.). If you get used to doing a lot of calculations, you should try it. It is easy to use if you understand the difference from list.

numpy.matrix --I have never used it ――I checked various things, but it seems that you should use ndarray --It seems convenient when using an m × n matrix

that's all.

Summary

Here is a summary of each in one word.

list: Basic dict: Strong name-value ties set: uncovered list tuple: Non-rewritable list numpy.array / numpy.ndarray: Numerical calculation specialization list Don't use it because it's a mess with numpy.matrix: numpy.ndarray

flowchart

Here is a diagram showing somehow the flow when deciding which one to use personally.

Actually, it's a little more complicated, but until you get used to it, you should think like this. Since tuple is not used, it is not included. (Actually, I don't use set too much)

At the end

This time I compared the Python array-like guys. There are more if you include the details and libraries. However, since this area is the basis, if you can understand this area, I think that other understandings will improve.

Compare the Python array-like guys

Introduction

Target audience

What to pick up

Python standard features

Library

Main story

Overview

Good point

Bad cousin

Impressions

Overview

Good point

Bad cousin

Impressions

Overview

Good point

Bad cousin

Impressions

Overview

Good point

Bad cousin

Impressions

(Supplement) What is numpy?

Overview

Good point

Bad cousin

Impressions

Summary

flowchart

At the end

(Supplement) What is `numpy`?