[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1

Introduction

While wandering around the net, I suddenly came across a site called "Language Processing 100 Knock 2020". While I wanted to touch natural language processing, programming was a new programmer who did a little competition pro. I'm a little interested, so I'll try it. At the time of writing this article, only half of the total is finished, but I will write it in a memorial sense. I will stop if my heart breaks. Please guess if there is no previous article.

Environment and stance

environment

stance

I will try to write a commentary as much as possible, but if you are interested, I recommend you to check it.

Solve "Chapter 1: Preparatory Movement"

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

00.py


print("stressed"[::-1])

Terminal


desserts

This is a process that makes use of Python slices. I often see slices when I'm a professional player. Slices can specify [start: stop: step].

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

01.py


print("Patatoku Kashii"[::2])

Terminal


Police car

Extracting every other character from the first character is easy with slices.

02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

02.py


print("".join([ i + j for i, j in zip("Police car", "taxi")]))

Terminal


Patatoku Kashii

I'm shortening the code length using join () which converts the list to a string, list comprehension, and zip () which gets the contents of multiple lists (why).

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

Ver that shortened meaninglessly

03.py


print(*(map(lambda x: len(x),"Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.".translate(str.maketrans({",":"",".":""})).split())))

Terminal


3 1 4 1 5 9 2 6 5 3 5 8 9 7 9

I can still go in one line ... (No sense of direction of effort) The map function executes a function for each element of list and returns a map object. That function is now defined by a lambda expression. The contents of the expression are defined to return the length of the given string. translate () replaces the string based on the conversion table created by str.maketrans (). Also, split () is listed by separating it with a space.

Maybe a decent ver

03.py


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
l = s.translate(str.maketrans({",": "", ".": ""})).split()
a = []
for i in l:
    a.append(len(i))
print(*a)

What you are doing is the same as the shorter one. The only thing that has changed is that what was done with the map function is made into a for statement. ʻAppend ()` adds an element to the end of the list.

The reason for adding * when printing is to expand and display the list.

04. Element symbol

Break down the sentence “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.” Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.

04.py


s="Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.".split()
l=[1,5,6,7,8,9,15,16,19]
dic={}
for i in range(len(s)):
    if i in l:
        dic[s[i][0]]=i+1
    else:
        dic[s[i][:2]]=i+1
print(dic)

Terminal


{'Hi': 1, 'H': 2, 'Li': 3, 'Be': 4, 'Bo': 5, 'C': 20, 'N': 10, 'O': 8, 'F': 9, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'Pe': 15, 'S': 16, 'Ar': 18, 'Ki': 19}

I made it into multiple lines with the idea. If ʻi is in l`, the first character is generated, otherwise the second character is the key.

  1. n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

Please refer to here for what n-gram is.

05.py


def N_gram(s, n=1):
    return [s[i:i+n] for i in range(len(s)-n+1)]


s = "I am an NLPer"
print(*(N_gram(s, 2)))
print(*(N_gram(s.split(), 2)))

Terminal


I   a am m   a an n   N NL LP Pe er
['I', 'am'] ['am', 'an'] ['an', 'NLPer']

The implementation of N_gram has become much more compact. It seems like this because there is a space in the execution result ... The range () function is a generator that returns integers from 0 to less than the specified number in order. You can also specify the 0 part. N = 1 in the function declaration part is a template argument. If not specified, n means 1.

06. Meeting

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

For the time being, parapara paradise seems to be a dance game.

06.py


def N_gram(s, n=1):
    return {s[i:i + n] for i in range(len(s) - n + 1)}


s1 = "paraparaparadise"
s2 = "paragraph"

X = N_gram(s1, 2)
Y = N_gram(s2, 2)

s_union = X | Y
s_intersection = X & Y
s_difference = X - Y

print(*s_union)
print(*s_intersection)
print(*s_difference)

if "se" in X:
    print("\"se\" is in X")

if "se" not in Y:
    print("\"se\" is not in Y")

Terminal


pa ar ad ap is se di ag ph gr ra
ar pa ra ap
is ad se di
"se" is in X
"se" is not in Y

It's like writing as if you can see the future. You can use set.union,set.intersection (), andset.difference (), but personally it's easier to use|,&, -. So I did this.

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = ”temperature”, z = 22.4, and check the execution result.

07.py


def temp(x=12, y="temperature", z=22.4):
    return str(x) + "of time" + str(y) + "Is" + str(z)


print(temp())

Terminal


The temperature at 12:00 is 22.4

05 . I'm using the template arguments mentioned in n-gram. If you write an assignment statement in the argument written at the time of function declaration, the function will be executed with that value even if no argument is given at execution time.

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications. ・ Replace with (219 --character code) characters in lowercase letters ・ Other characters are output as they are Use this function to encrypt / decrypt English messages.

08.py


def cipher(s):
    return "".join(c.islower()*chr(219-ord(c))+(not c.islower())*c for c in s)


print(cipher("The quick brown fox jumps over the lazy dog."))
print(cipher(cipher("The quick brown fox jumps over the lazy dog.")))

Terminal


Tsv jfrxp yildm ulc qfnkh levi gsv ozab wlt.
The quick brown fox jumps over the lazy dog.

I did my best to make it a party. (No, not) This time, I make good use of the fact that Python's bool type is a subclass of int type. islower () is a function that determines whether it is lowercase. 219 --The character code is because it returns well after two times.

  1. Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

Typoglycemia is a phenomenon in which some words in a sentence can be read correctly even if the order other than the first and last letters is changed (however, urban legend / Net meme).

09.py


import random


def typoglycemia(s):
    return s if len(s) < 4 else s[0] + "".join(random.sample([i for i in s[1: -1]], len(s)-2)) + s[-1]


s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind .".split()
print(" ".join(map(lambda x: typoglycemia(x), s)))

I also put the function on one line (why). If it is less than 4 characters, it will be left as it is, otherwise it will be shuffled except for the last character of the first character, concatenated and returned. Unlike random.shuffle (), random.sample () is characterized by the fact that the first argument can be immutable (non-modifiable). Also, random.shuffle () has no return value, but random.sample () returns a list.

in conclusion

I solved the problems in Chapter 1, but how was it? I think there were a lot of weird implementations, but that's playful. Please forgive me for now as I have to implement it properly even if I don't like it in the second half. From the next chapter, I hope to increase the amount of commentary.

Please comment if you like "this will shorten the code" or "this is better".

See you in the article in Chapter 2.

Recommended Posts

[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1
[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 2 [First half: 10 ~ 15]
100 Language Processing Knock 2020 Chapter 1
100 Language Processing Knock Chapter 1
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 8: Neural Net
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 Language Processing Knock Chapter 1 by Python
100 Language Processing Knock 2020 Chapter 3: Regular Expressions
100 Language Processing Knock 2015 Chapter 4 Morphological Analysis (30-39)
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
100 Language Processing Knock with Python (Chapter 2, Part 2)
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 10 to 14]
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 15 to 19]
100 language processing knock 2020 [00 ~ 39 answer]
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Amateur Language Processing Knock: 17
100 language processing knock 2020 [00 ~ 49 answer]
100 language processing knocks ~ Chapter 1
100 Amateur Language Processing Knock: 07
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
100 Language Processing Knock UNIX Commands Learned in Chapter 2
100 Language Processing Knock Regular Expressions Learned in Chapter 3
100 Language Processing with Python Knock 2015
100 Language Processing Knock-58: Tuple Extraction
100 Language Processing Knock-57: Dependency Analysis
100 language processing knock-50: sentence break
100 Language Processing Knock: Chapter 2 UNIX Command Basics (using pandas)
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity
I tried 100 language processing knock 2020
100 language processing knock-56: co-reference analysis
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")
100 Amateur Language Processing Knock: Summary
100 Natural Language Processing Knock Chapter 2 UNIX Command Basics (Second Half)
100 Natural Language Processing Knock Chapter 2 UNIX Command Basics (First Half)
100 Language Processing Knock Chapter 10 Vector Space Method (II) + Overall Summary
100 natural language processing knocks Chapter 4 Commentary