Python Application: Data Cleansing Part 1: Python Notation

Basics of lambda expression

Creating an anonymous function

When creating a function in Python, define it as follows.

#Example: x^Function pow1 that outputs 2(x)
def pow1(x):
    return x ** 2
#Anonymous functions (lambda expressions (lambda expressions)) can be used here to simplify the code.

# pow1(x)Anonymous function pow2 that has the same function as
pow2 = lambda x: x ** 2
#You can store your expression in a variable called pow2 by using a lambda expression.

The structure of the lambda expression is as follows, which means that the argument x is returned as x ** 2 in pow2 above.

lambda argument:Return value

When you pass an argument to a lambda expression and actually calculate it, you can use it in the same way as the function created by def just by specifying as follows.

#Pass the argument a to pow2 and store the calculation result in b
b = pow2(a)

Calculation by lambda

If you want to create a multivariable function with a lambda expression, write as follows.

#Example:Function add1 that adds two arguments
add1 = lambda x, y: x + y

Although lambda expressions can be stored in variables It can be used without storing it in a variable. For example, if you want to get the result of substituting two arguments 3 and 5 into the lambda expression of add1 above. Describe as follows.

(lambda x, y: x + y)(3, 5)
#Output result
8

This only added a lot of work, but "store in a variable" = "name and define the function" Not having to do this makes the function very easy to use.

lambda with if

Unlike the function by def, lambda can specify only an expression in the return value part. For example, the function by def can perform the following processing, but this cannot be expressed by lambda.

# "hello."Function to output
def say_hello():
    print("hello.")

However, conditional branching using if can be created with lambda using a technique called the ternary operator (conditional operator).

#A function that multiplies 2 if the argument x is less than 3, divides by 3 if it is 3 or more, and adds 5.
def lower_three1(x):
    if x < 3:
        return x * 2
    else:
        return x/3 + 5
#Expressing the above function as lambda, it looks like this.

# lower_Same function as three1
lower_three2 = lambda x: x * 2 if x < 3 else x/3 + 5

The notation of the ternary operator is as follows.

It's a little confusing

Processing when conditions are met if condition Processing when the else condition is not met

In this way, using the ternary operator can be used in various situations other than lambda. You can save the number of lines of code.

Use of lambda expressions

list split

If you want to split the string with spaces or slashes Use the split () function. The string split by the split () function is returned as a list type.

The string you want to split.split("Delimiter",Number of divisions)

For example, you can split an English sentence with spaces to make a list of words.

#The string you want to split
test_sentence = "this is a test sentence."
#List with split
test_sentence.split(" ")
#Output result
['this', 'is', 'a', 'test', 'sentence.']

If you specify the number of divisions in the argument, the character string is divided by the specified number from the beginning. If the specified number of times is exceeded, no further division will be performed.

#The string you want to split
test_sentence = "this/is/a/test/sentence."
#List with split
test_sentence.split("/", 3)
#Output result
['this', 'is', 'a', 'test/sentence.']

Split of list (re.split)

The standard split () function cannot be split by multiple symbols at once. To split a string with multiple symbols at once

of the re module
re.split()Use a function.

The re.split () function allows you to specify multiple symbols in [] of [Separator]. It is possible to divide by multiple symbols at once.

re.split("[Delimiter]",The string you want to split)
#import re module
import re
#The string you want to split
test_sentence = "this,is a.test,sentence"
# ","When" "When"."Divide with and make a list
re.split("[, .]", test_sentence)
#Output result
['this', 'is', 'a', 'test', 'sentence']

Higher-order function (map)

Functions that take other functions as arguments

It is called a higher-order function

If you want to apply a function to each element of list

map()Use a function.
#Iterator(Store calculation method)Do not calculate
map(Function you want to apply,Array)

#Return the calculation result to list
list(map(function,Array))

For example, to get the absolute value of each element of the array a = [1, -2, 3, -4, 5], use the for loop and write as follows.

a = [1, -2, 3, -4, 5]
#Apply function in for loop
new = []
for x in a:
    new.append(abs(x))
print(new)
#Output result
[1, 2, 3, 4, 5]
#Map this()Using the function, it is possible to write concisely as follows.

a = [1, -2, 3, -4, 5]
#Apply function with map
list(map(abs, a))
#Output result
[1, 2, 3, 4, 5]

##Starting with universal functions such as abs
##Variable set in lambda(Functions are also valid)

By enclosing it in the list () function The result of applying the map () function (the result of applying abs in the above example) You can store it in the list again.

At this time, if you simply set list = in the variable name I intend to call the list () function, but the value is stored in the variable list Please note that an error will occur.

Iterator

A class that has the ability to retrieve multiple elements in sequence. By using this function to retrieve elements in order, rather than using a for loop Because the execution time can be shortened If you want to apply a function to an array with a huge number of elements, use the map () function.

filter

To extract only the elements that satisfy the conditions from each element of list

filter()Use a function.
#Returns an iterator
filter(Conditional function,Array)

#Return the calculation result to list
list(filter(function,Array))
#The conditional function is lambda x: x>True for input like 0/A function that returns False.

#For example
a = [1, -2, 3, -4, 5]
#To get a positive element from the array, use a for loop and write as follows.

a = [1, -2, 3, -4, 5]
#Filtering with for loop
new = []
for x in a:
    if x > 0:
        new.append(x)
print(new)
#Output result
[1, 3, 5]
#If you use this as a filter, you can write it concisely as follows.

a = [1, -2, 3, -4, 5]
#Filtering with filter
list(filter(lambda x: x>0, a))
#Output result
[1, 3, 5]

sotred

There is a sort () function for sorting list, but if you want to sort by more complicated conditions

sorted()Use a function.
#Set key and sort
sorted(Array you want to sort, key=Key function, reverse=True or False)
#For the key function, specify which element is used for sorting.
#Here lambda x: x[n]By specifying, sorting is performed based on the nth element.
#Set reverse to True to sort in descending order.

#For example, an array with two elements as an element(Nested array)about
#If you want to sort the second element of each element in ascending order, write as follows.

#Nested array
nest_list = [
    [0, 9],
    [1, 8],
    [2, 7],
    [3, 6],
    [4, 5]
]
#Sort by the second element as a key
sorted(nest_list, key=lambda x: x[1])
#Output result
[[4, 5], [3, 6], [2, 7], [1, 8], [0, 9]]

List comprehension

List generation

Because the map () function is originally specialized in creating iterators It takes time to generate an array with the list () function. Therefore, if you want to simply generate an array using the same method as the map () function, for loop

Use list comprehension
[Function you want to apply(element) for element in 適用する元の配列]
#For example
a = [1, -2, 3, -4, 5] #To take the absolute value of each element of the array, write as follows.

a = [1, -2, 3, -4, 5]
#Take the absolute value of each element in list comprehension
[abs(x) for x in a]
#Output result
[1, 2, 3, 4, 5]
#Map as below()It can be said that it is simpler to write by looking at the number of parentheses than using a function.

#Create list with map
list(map(abs, a))
#Output result
[1, 2, 3, 4, 5]

#Map when creating an iterator()function
#If you want to get an array directly, it is better to use it properly with list comprehension notation.

Loop using if statement

When conditional branching is performed in the list comprehension notation You can perform the same operation as the filter () function. How to use the postfix if is as follows.

[Function you want to apply(element) for element in フィルタリングしたい配列 if 条件]

If you just want to retrieve elements that meet the conditions Describe the part of (function (element) you want to apply) as (element).

#For example
a = [1, -2, 3, -4, 5] #To extract the positive elements from the array, write as follows.

a = [1, -2, 3, -4, 5]
#List comprehension filtering(Postfix if)
[x for x in a if x > 0]
#Output result
[1, 3, 5]
Please note that it is different from the ternary operator introduced in lambda.
The ternary operator also applies to elements that do not meet the conditions
Whereas some kind of processing needs to be defined
If you add an if, you can ignore the elements that do not meet the conditions.

Simultaneous loop of multiple arrays

If you want to loop multiple arrays at the same time

zip()Use a function.
#For example
a = [1, -2, 3, -4, 5], b = [9, 8, -7, -6, -5]
#If you want to loop the array at the same time, use the for statement and write as follows.

a = [1, -2, 3, -4, 5]
b = [9, 8, -7, -6, -5]
#Parallel loop using zip
for x, y in zip(a, b):
    print(x, y)
#Output result
1 9
-2 8
3 -7
-4 -6
5 -5
#Similarly for list comprehension, zip()It is possible to process multiple arrays in parallel using functions.

a = [1, -2, 3, -4, 5]
b = [9, 8, -7, -6, -5]
#Process in parallel with list comprehension
[x**2 + y**2 for x, y in zip(a, b)]
#Output result
[82, 68, 58, 52, 50]

Multiple loop

I used the zip () function to loop at the same time Do more loops inside the loop

The multiple loop is written as follows in the for statement.
a = [1, -2, 3]
b = [9, 8]
#Double loop
for x in a:
    for y in b:
        print(x, y)
#Output result
1 9
1 8
-2 9
-2 8
3 9
3 8
#Similarly, in list comprehension notation, simply writing the for statement twice side by side creates a double loop.

a = [1, -2, 3]
b = [9, 8]
#Double loop in list comprehension
[[x, y] for x in a for y in b]
#Output result
[[1, 9], [1, 8], [-2, 9], [-2, 8], [3, 9], [3, 8]]

Dictionary object

defaultdict

Python dictionary type objects Because every time you add a new key, you need to initialize that key The process becomes complicated.

For example, the number of each element in the list lst The program to be reflected in the dictionary d is as follows.

Because a key that does not exist will be "Key Error" Every time you register a new element in d, you need to initialize the number of elements.

#Record the number of occurrences of each element in the list lst in dictionary d
d = {}
lst = ["foo", "bar", "pop", "pop", "foo", "popo"]
for key in lst:
    #key to d(element)Divide the process depending on whether is already registered or not
    if key in d:
        #key to d(element)If is registered
        #Add the number of elements
        d[key] += 1
    else:
        #key to d(element)Is not registered
        #Need to initialize the number of elements
        d[key] = 1
print(d)
#Output result
{'foo': 2, 'bar': 1, 'pop': 2, 'popo': 1}

So, in the collections module

By using the defaultdict class
Resolve this issue.

The defaultdict class is defined as follows. For the value type, specify a data type such as int or list.

from collections import defaultdict

d = defaultdict(type of value)

defaultdict can be used like a dictionary type If you write a program that performs the same processing as above with defaultdict, it will be as follows. You can see that the number of elements can be enumerated without initializing the value.

from collections import defaultdict
#Record the number of occurrences of each element in the list lst in dictionary d
d = defaultdict(int)
lst = ["foo", "bar", "pop", "pop", "foo", "popo"]
for key in lst:
    d[key] += 1
    # else: d[key] =No need to write 1 to initialize
print(d)
#Output result
defaultdict(<class 'int'>, {'foo': 2, 'bar': 1, 'pop': 2, 'popo': 1})

When sorting the dictionary type object of the output result by key or value Use the sorted () function. The sorted () function is

sorted(Sort target,Key used for sorting,Call in the format of sort option).
sorted(Dictionary name.items(), key=Specify array with lambda, reverse=True)

The key used for sorting is extracted in the list format of (key, value) by specifying items. When sorting by key, specify "first in the list", that is, x [0] in lambda.

Also, when sorting by value, specify "second in the list", that is, x [1] in lambda.

The sort option defaults to ascending order, which is descending if reverse = True is specified. To sort the output result of the above program example by value in descending order and output it, write as follows.

print(sorted(d.items(), key=lambda x: x[1], reverse=True))

Add element in value

Use defaultdict to add an element to a list type dictionary.

from collections import defaultdict

defaultdict(list)

Because value is a list type

Dictionary name[key].append(element)If you specify

You can add elements to value. This also takes a lot of work with a standard dictionary type object as follows.

#Add value element to dictionary
d ={}
price = [
    ("apple", 50),
    ("banana", 120),
    ("grape", 500),
    ("apple", 70),
    ("lemon", 150),
    ("grape", 1000)
]
for key, value in price:
    #Conditional branch due to the existence of key
    if key in d:
        d[key].append(value)
    else:
        d[key] = [value]
print(d)
#Output result
{'apple': [50, 70], 'banana': [120], 'grape': [500, 1000], 'lemon': [150]}

Using defaultdict here eliminates the need for conditional branching. By using this, you can group values for each key.

Counter

In addition to the defaultdict class in the collections module There are several data storage classes.

Counter class is the same as defaultdict You can use it like a dictionary object, This class is more specialized in counting elements.

The Counter class is defined as follows. For the data you want to count, specify, for example, an array in which words are decomposed, a character string, or a dictionary.

from collections import Counter

d = Counter(Data you want to count)

With the Counter class To create a dictionary with a word as key and the number of occurrences as value It can be realized just by writing as follows Than defaultdict because it doesn't use for loop You can count the execution time short and concisely.

#Counter import
from collections import Counter

#Record the number of occurrences of elements in the dictionary
lst = ["foo", "bar", "pop", "pop", "foo", "popo"]
d = Counter(lst)

print(d)
#Output result
Counter({'foo': 2, 'pop': 2, 'bar': 1, 'popo': 1})

The Counter class has some functions to help you count

most_common()The function returns an array of elements sorted by frequency in descending order.

The usage of the most_common () function is as follows. Specify an integer for the number of elements to be acquired. For example, specifying 1 returns the most frequent element. If nothing is specified, all elements will be sorted and returned.

#Dictionary name.most_common(Number of elements to get)
#Store character strings in Counter and count the frequency of character appearance
d = Counter("A Counter is a dict subclass for counting hashable objects.")

#Arrange the most 5 elements
print(d.most_common(5))
#Output result
[(" ", 9), ("s", 6), ("o", 4), ("c", 4), ("a", 4)]

Recommended Posts

Python Application: Data Cleansing Part 1: Python Notation
Python Application: Data Handling Part 3: Data Format
Python application: data visualization part 1: basic
Python application: Data cleansing # 2: Data cleansing with DataFrame
Python Application: Data Visualization Part 3: Various Graphs
Python Application: Data Handling Part 2: Parsing Various Data Formats
Python application: Pandas Part 1: Basic
Python application: Pandas Part 2: Series
Python application: data visualization # 2: matplotlib
Python application: Data cleansing # 3: Use of OpenCV and preprocessing of image data
Python application: Data handling Part 1: Data formatting and file input / output
Data cleansing 1 Convenient Python notation such as lambda and map
Python application: Numpy Part 3: Double array
Process Pubmed .xml data with python [Part 2]
Python application: Pandas Part 4: DataFrame concatenation / combination
[Python] Web application from 0! Hands-on (4) -Data molding-
QGIS + Python Part 2
QGIS + Python Part 1
Data analysis python
Python: Scraping Part 1
Python3 Beginning Part 1
[python] Read data
Python: Scraping Part 2
[Introduction to Udemy Python3 + Application] 60. List comprehension notation
[Introduction to Udemy Python3 + Application] 62. Set comprehension notation
Data acquisition from analytics API with Google API Client for python Part 2 Web application
Data analysis with python 2
Python Data Visualization Libraries
Data analysis using Python 0
Data analysis overview python
Data cleansing 2 Data cleansing using DataFrame
Data cleaning using Python
"My Graph Generation Application" by Python (PySide + PyQtGraph) Part 2
Python basic memorandum part 2
Web application made with Python3.4 + Django (Part.1 Environment construction)
Python basic memo --Part 2
Python data analysis template
[Python tutorial] Data structure
[Python] Sorting Numpy data
"My Graph Generation Application" by Python (PySide + PyQtGraph) Part 1
Python basic memo --Part 1
Class notation in Python
Application of Python 3 vars
Data analysis with Python
Python application: Pandas # 3: Dataframe
[Introduction to cx_Oracle] (Part 6) DB and Python data type mapping
I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 1)
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
Image processing with Python (Part 2)
Sample data created with python
My python data analysis container
Handle Ambient data in Python
Studying Python with freeCodeCamp part1
Bordering images with python Part 1
data structure python push pop
Scraping with Selenium + Python Part 1
Python: Ship Survival Prediction Part 2
Python for Data Analysis Chapter 4
Display UTM-30LX data in Python