Introduction

I tried Language processing 100 knock 2020. Links to other chapters can be found at here, and source code can be found at here.

Chapter 1 Preparatory movement

No.00 Reverse order of character strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

Answer

`000.py`


str = 'stressed'
print(str[::-1])

# -> desserts

Comments

Output in reverse order using slices. It's interesting to be able to easily write such operations.

No.01 "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

Answer

`001.py`


str = "Patatoku Kashii"
print(str[0:8:2])

# ->Police car

Comments

Since the odd number is taken out, step is set to 2.

No.02 "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

Answer

`002.py`


str1 = "Police car"
str2 = "taxi"
print(''.join([s1 + s2 for s1, s2 in zip(str1, str2)]))

# ->Patatoku Kashii

Comments

At first I thought about looping with ʻindex, It seems that you can handle multiple functions at once by using the zip` function.

No.03 Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

Answer

`003.py`


sentense = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
print([len(item) for item in sentense.replace(',', "").replace('.', "").split(' ')])

# -> [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

Comments

I tried using list comprehension notation. It may be convenient because you can write in a few lines when creating a new list.

No.04 Element symbol

Break down the sentence “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.” Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.

Answer

`004.py`


str = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
str = str.split()
num = [1, 5, 6, 7, 8, 9, 15, 16, 19]
dict = {}

for i_str in range(0, len(str)):
    if i_str + 1 == 12:
        dict[str[11][:3:2]] = 12  # 'Mg'Output of
    elif i_str + 1 in num:
        dict[str[i_str][:1]] = i_str + 1
    else:
        dict[str[i_str][:2]] = i_str + 1
print(dict)

# -> {'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'Ne': 10, 'Na': 11, 'Mg': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17, 'Ar': 18, 'K': 19, 'Ca': 20}

Comments

I feel like the code is a little long ... If the rules are followed, the Mg part will be output as Mi and I'm curious, so I'm processing it with the ʻif` statement.

No.05　n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

Answer

`005.py`


def n_gram(list, n):
    return ["".join(list[list_i: list_i + n]) for list_i in range(len(list) - n + 1)]

sentence = "I am an NLPer"
print(f"Word bi-gran：  {n_gram(sentence.split(), 2)}")
print(f"Character bi-gram：  {n_gram(sentence, 2)}")

# ->Word bi-gran：  ['Iam', 'aman', 'anNLPer']
#Character bi-gram：  ['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

Comments

Use join to join the elements of the list. Since the word bi-gram and the character bi-gram are doing similar processing, I tried to make it a function, but I feel that I was able to write it well.

No.06 set

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

Answer

`006.py`


str1 = "paraparaparadise"
str2 = "paragraph"

def n_gram(list, n):
    return {"".join(list[list_i: list_i + n]) for list_i in range(len(list) - n + 1)}

X = n_gram(str1, 2)
Y = n_gram(str2, 2)
print(f"Union:{X | Y}")
print(f"Intersection:{X & Y}")
print(f"Difference set:{X - Y}")

se = {"se"}
print(f"Is se included in X? :{se <= X}")
print(f"Is se included in Y? :{se <= Y}")

# ->Union:{'ph', 'di', 'ar', 'gr', 'ad', 'is', 'se', 'ap', 'pa', 'ra', 'ag'}
#Intersection:{'ra', 'ap', 'ar', 'pa'}
#Difference set:{'is', 'di', 'se', 'ad'}
#Is se included in X? : True
#Is se included in Y? : False

Comments

Union () , ʻintersection (), difference () can also be used for union, product, and difference.

No.07 Sentence generation using template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = ”temperature”, z = 22.4, and check the execution result.

Answer

`007.py`


def templete(x, y, z):
    return f"{x}of time{y}Is{z}"

print(templete(12, "temperature", 22.4))

# ->The temperature at 12:00 is 22.4

Comments

nothing special.

No.08 Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications. ・ If lowercase letters, replace with (219 --character code) characters ・ Other characters are output as they are Use this function to encrypt / decrypt English messages.

Answer

`008.py`


def cipher(sentence):
    return "".join([chr(219 - ord(ch)) if ch.islower() else ch for ch in sentence])

sen = "FireWork"
print(cipher(sen))
print(cipher(cipher(sen)))

# -> FrivWlip
#    FireWork

Comments

It seems to be Atbash encryption. You can get it back by passing the cipher function twice.

No.09　Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

Answer

`009.py`


import random

sentence = "I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind."
new_sent = ""
for item in sentence.split():
    if len(item) > 4:
        new_item = []
        new_item.extend(item[0])
        new_item.extend(random.sample(item[1:-1], len(item) - 2))
        new_item.extend(item[-1])
        item = new_item
    new_sent += "".join(item) + " "

print(new_sent)

# -> I could’nt blveeie that I cuold atlculay utnresnadd what I was renadig : the pamohneenl pewor of the human mdin.

Comments

In addition to random.sample, there is random.shuffle as a function to randomly arrange the elements of the list. The shuffle function sorts the original list, so I think the code can be a little shorter.

reference

[Upura / nlp100v2020 100 language processing knock 2020] is solved with Python](https://github.com/upura/nlp100v2020) Amateur language processing 100 knock summary

I tried 100 language processing knock 2020: Chapter 1

Introduction

Chapter 1 Preparatory movement

No.00 Reverse order of character strings

000.py

No.01 "Patatokukashi"

001.py

No.02 "Police car" + "Taxi" = "Patatokukashi"

002.py

No.03 Pi

003.py

No.04 Element symbol

004.py

005.py

No.06 set

006.py

No.07 Sentence generation using template

007.py

No.08 Ciphertext

008.py

009.py

reference

`000.py`

`001.py`

`002.py`

`003.py`

`004.py`

`005.py`

`006.py`

`007.py`

`008.py`

`009.py`