I will write a detailed explanation to death while solving 100 natural language processing knock 2020 with Python

I will write each function that comes out in such detail that it will die. I will also write what kind of idea led to such code: fist:

Chapter 1 Preparatory Movement

00 Reverse order of strings

str = "stressed"
reverse_str = str[::-1]
print(reverse_str)

Commentary The knowledge of "slices" is used in this problem. A slice is a mechanism that cuts out a part of a sequence (string, list, tuple, etc.) and returns a copy.

str[n] #Extract the nth character of str
str[start:stop:step] 

#start...From what number?
#stop...end point(+Let's specify a value of 1!)
#step...How many pieces do you skip?

01 "Patatokukashi"

str = "Patatoku Kashii"
print(str[1::2])

Commentary: You can use the knowledge of slices obtained in 00. str [From the first character: (Since it is blank, until the end): Skip 2] When no value is entered in the slice, it is interpreted as the end.

02 "Police car" + "Taxi" = "Patatokukashi"

str1 = "Police car"
str2 = "taxi"
ans = ""
for i in range(len(str1)):
   ans += str1[i]
   ans += str2[i]

print(ans)

Commentary: range () ... Creates a range type object that has consecutive numerical values from the start number to the end number specified in the argument as elements. The format of the range function is as follows:

range(stop)
range(start, stop[,step])

range(5)
--> 0 1 2 3 4

range(0, 5)
--> 0 1 2 3 4

range(4,7)
--> 4 5 6

range(0, 5, 1)
--> 0 1 2 3 4

range(0, 10, 2)
--> 0 2 4 6 8

03 Pi

str = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
ans = str.replace(",","").replace(".", "").split(" ")
print(ans)

Commentary: All you have to do is understand replace () and split (). replace function .... replace ("string you want to remove", "what to replace") split function .... split ("Why split the whole sentence") → List is returned

s = 'one two one two one'

print(s.replace(' ', '-'))
====> one-two-one-two-one

print(s.split(" "))
====> ['one', 'two', 'one', 'two', 'one']

04 element symbol

#A function that determines whether to retrieve the first character or the first two characters
def extWord(i, word):
  if i in [1,5,6,7,8,9,15,16,19]
    return (word[0], i)
  else:
    return (word[:2], i)

str = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
text = str.replace('.', '').replace(',' '')
ans = []

Commentary: First of all, explanation of the enumerate function ... A function that can retrieve the index of an element and the element at the same time. Often used with for statements. enumerate(iterable, start)

x = ['a','b','c','d','e','f','g','h']
for a,b in enumerate(x):
    print(a,b, end=" ")

==========> 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h
#The index is assigned to the first variable and the element is assigned to the second variable.

Explanation about extWord (): Classify by index → The idea of using the enumerate function. So, keeping in mind to use the enumerate function, specify the argument so that it becomes extWord ('index','element'). The rest is just conditional branching with the if statement. Explanation about ans = [extWord (i, w) for i, w in enumerate (text.split ())]: When you first see it, you'll probably get "What?", But this is a list comprehension. The following two notations give the same result

ans = [extWord(i, w) for i, w in enumerate(text.split())]

ans = []
for i, w in enumerate(text.split()):
  ans.append = extWord(i, w)

Explanation about dict (): dict () is a built-in function that casts what you put in the argument into a dictionary type 05 n-gram First of all, I will explain what n-gram is.

What is n-gram

In a nutshell, "n-gram" represents a group of n consecutive words or letters. Let's look at a concrete example!

Police Box in Front of Kameari Park, Katsushika Ward
This 1-In gram
['This', 'Chi', 'Et al.', 'Kuzu', 'Decoration', 'Ward', 'turtle', 'Yes', 'public', 'Garden', 'Before', 'Faction', 'Out', 'Place']
2-gram
['Here', 'Glance', 'Kuzu', 'Katsushika', 'Decorative area', 'Ward turtle', 'Kameari', 'Public', 'park', 'In front of the garden', 'Pre-Raphaelite', 'Dispatch', 'source']
3-gram
['Here', 'Chirakuzu', 'Katsushika', 'Katsushika', 'Ornamental turtle', 'Kameari Ward', 'Kameari', 'Yes Park', 'In front of the park', 'Pre-Raphaelite', 'Previous dispatch', 'Police station']

Based on this

def n_gram(target, n):
  return [target[index: index + n] for index in range(len(target) - n + 1)]

str = 'I am an NLPer'
for i in range(1,4)
  print(n_gram(str, i))
  print(n_gram(str.split(' '), i))

Commentary:

def n_gram(target, n):
  return [target[index: index + n] for index in range(len(target) - n + 1)]

I'm finally talking about this. n_gram ('desired string', what gram do you want?) 2 lines are list comprehension

list = [target[index: index + n] for index in range(len(target) - n + 1)]

list = []
for index in range(len(target) - n + 1) #I want to specify the back. That is, if it becomes less than the last n characters, it cannot be done, so stop there
 list.append(target[index: index + n])

List comprehension is easy to understand if you cure it once with a normal for sentence.

06 Set

def n_gram(target, n):
    return {target[idx:idx + n] for idx in range(len(target) - n + 1)}

str1 = "paraparaparadise"
str2 = "paragraph"

X = n_gram(str1, 2)
Y = n_gram(str2, 2)

#Union of X and Y
union_set = X | Y # X.union(Y)But yes
print(union_set)
#Intersection
intersection_set = X & Y # X.intersection(Y)But yes
print(intersection_set)
#Difference set
difference_set = X - Y # X.difference(Y)But yes
print(difference_set)
#Is se included in X and Y
print('se' in (X & Y))

There is nothing special to explain about this. I hope you can understand it if you read the comments.

07 Sentence generation in template

def make_sentence(x, y, z):
    sentence = str(x) + "of time" + y + "Is" + str(z)
    return sentence

print(make_sentence(12, "temperature", 22.4))

There is no particular explanation for this either.

08 Ciphertext

def cipher(sentence):
    sentence = [chr(219 - ord(x)) if x.islower() else x for x in sentence]
    return ''.join(sentence)

x = 'Hey! Are ready ?? 123456'
print('Plaintext', x)
x = cipher(x)
print('Cryptogram', x)
x = cipher(x)
print('Decryption statement', x)

Commentary: First, cure the list comprehension into a normal for statement

def cipher(sentence):
  sentence = [chr(219 - ord(x)) if x.islower() else x for x in sentence]
  return ''.join(sentence)
#Two are the same
def chiper(sentence):
  sentence = []
  for x in sentence:
    if x.islower:
      sentence.append(char(219 - ord(x)))
    else:
      sentence.append(x)
  return ''.join(sentence)

Explanation about islower (): islower function ... Returns True if all letters are lowercase, False otherwise. Explanation about join (): join is a method of str, which joins iterable to a string

list = ['a','b','c','d','e,'f]
x = ''.join(list)
====> abcdef
x = ','.join(list)
====> a,b,c,d,e,f

Explanation of encryption / decryption: Pay attention to the chr (219-ord (x)) part. Note that the x value of chr (x) returns to the original value if you do it twice. Let's actually enter your favorite numbers!

Recommended Posts

I will write a detailed explanation to death while solving 100 natural language processing knock 2020 with Python
[Python] I played with natural language processing ~ transformers ~
I want to write to a file with Python
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
100 Language Processing with Python Knock 2015
The first artificial intelligence. I wanted to try natural language processing, so I will try morphological analysis using MeCab with python3.
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock with Python (Chapter 2, Part 2)
3. Natural language processing with Python 2-1. Co-occurrence network
3. Natural language processing with Python 1-1. Word N-gram
I tried a functional language with Python
100 Language Processing Knock with Python (Chapter 2, Part 1)
I tried natural language processing with transformers.
I tried to make a calculator with Tkinter so I will write it
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
Python inexperienced person tries to knock 100 language processing 14-16
I want to make a game with Python
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
I tried to extract named entities with the natural language processing library GiNZA
3. Natural language processing with Python 3-3. A year of corona looking back at TF-IDF
I tried to classify Mr. Habu and Mr. Habu with natural language processing × naive Bayes classifier
Python: Natural language processing
I tried to draw a route map with Python
I want to write in Python! (2) Let's write a test
[Chapter 5] Introduction to Python with 100 knocks of language processing
3. Natural language processing with Python 4-1. Analysis for words with KWIC
Building an environment for natural language processing with Python
I want to work with a robot in python.
[Chapter 3] Introduction to Python with 100 knocks of language processing
I tried to automatically generate a password with Python3
[Python] A memo to write CSV vertically with Pandas
[Chapter 2] Introduction to Python with 100 knocks of language processing
A program to write Lattice Hinge with Rhinoceros with Python
After doing 100 language processing knock 2015, I got a lot of basic Python skills Chapter 1
I want to run a quantum computer with Python
[Chapter 4] Introduction to Python with 100 knocks of language processing
Sentiment analysis with natural language processing! I tried to predict the evaluation from the review text
3. Natural language processing with Python 3-4. A year of corona looking back on TF-IDF [Data creation]
[Practice] Make a Watson app with Python! # 3 [Natural language classification]
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
I made a package to filter time series with python
I wrote a program quickly to study DI with Python ①
I tried to divide with a deep learning language model
How to write offline real-time Solving E05 problems with Python
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
100 Language Processing Knock Chapter 1 (Python)
Write to csv with Python
I tried 100 language processing knock 2020
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")
[5th] I tried to make a certain authenticator-like tool with python
I made a library to easily read config files with Python
I want to use a wildcard that I want to shell with Python remove
[2nd] I tried to make a certain authenticator-like tool with python
[Python] A memo that I tried to get started with asyncio
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 10 to 14]
Easily build a natural language processing model with BERT + LightGBM + optuna
I tried to create a list of prime numbers with python