Python beginner tried 100 language processing knock 2015 (05 ~ 09)

Preface

I wrote the code in the continuation of the last time. I shortened the part that is easy to see and can be written as short as possible. Please point out any mistakes or improvements.

(Addition) Since you pointed out how to write the code, I added after editing 05, 06 and 09.

  1. n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

05.py


def word_ngram(seq, n):
    return ["-".join(seq.split()[i:i+n]) for i in range(len(seq.split())-n+1)]

def char_ngram(seq, n):
    return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]

def main():
    seq = "I am an NLPer"
    word_2gram_list, char_2gram_list = word_ngram(seq, 2), char_ngram(seq, 2)
    print(word_2gram_list)
    print(char_2gram_list)

if __name__ == '__main__':
    main()
['I-am', 'am-an', 'an-NLPer']
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

The word bi-gram is displayed separated by a hyphen, and the character bi-gram is displayed with spaces.

(After editing ↓) Word_ngram was rewritten.

05.py


def word_ngram(seq, n):
    words = seq.split()
    return ["-".join(words[i:i+n]) for i in range(len(words)-n+1)]

def char_ngram(seq, n):
    return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]

def main():
    seq = "I am an NLPer"
    word_2gram_list, char_2gram_list = word_ngram(seq, 2), char_ngram(seq, 2)
    print(word_2gram_list)
    print(char_2gram_list)

if __name__ == '__main__':
    main()

06. Meeting

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

06.py


# coding:utf-8

def n_gram(seq, n):
    return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]

def main():
    X, Y = set(n_gram("paraparaparadise", 2)), set(n_gram("paragraph", 2))
    print("X: " + str(X))
    print("Y: " + str(Y))
    print("Union: " + str(X.union(Y)))
    print("Intersection: " + str(X.intersection(Y)))
    print("Difference set: " + str(X.difference(Y)))
    print("Does X contain se?: " + str("se" in X))
    print("Does Y include se?: " + str("se" in Y))

if __name__ == '__main__':
    main()
X: {'pa', 'ar', 'di', 'se', 'ad', 'ap', 'is', 'ra'}
Y: {'pa', 'ar', 'ph', 'ap', 'ag', 'gr', 'ra'}
Union: {'di', 'se', 'ap', 'ag', 'pa', 'ar', 'ph', 'ad', 'is', 'gr', 'ra'}
Intersection: {'ap', 'ar', 'ra', 'pa'}
Difference set: {'is', 'ad', 'di', 'se'}
Does X contain se?: True
Does Y include se?: False

For n_gram, I used char_ngram of 05.py as it is.

(After editing ↓) The print part was rewritten.

06.py


# coding:utf-8

def n_gram(seq, n):
    return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]

def main():
    X, Y = set(n_gram("paraparaparadise", 2)), set(n_gram("paragraph", 2))
    print("X: ", X)
    print("Y: ", Y)
    print("Union: ", X | Y)
    print("Intersection: ", X & Y)
    print("Difference set: ", X - Y)
    print("Does X contain se?: ", "se" in X)
    print("Does Y include se?: ", "se" in Y)

if __name__ == '__main__':
    main()

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

07.py


# coding:utf-8

def ans(x, y, z):
    return '{}of time{}Is{}'.format(x, y, z)

def main():
    x, y, z = 12, 'temperature', 22.4
    print(ans(x, y, z))

if __name__ == '__main__':
    main()

This wasn't that difficult. It was a problem if I knew the format.

08. Ciphertext

Implement the function cipher that converts each character of the given character string with the following specifications. ・ If lowercase letters, replace with (219 --character code) characters ・ Other characters are output as they are Use this function to encrypt / decrypt English messages.

08.py


# coding:utf-8

def cipher(seq):
    return ''.join(chr(219-ord(i)) if i.islower() else i for i in seq)

def main():
  seq = 'Is the order a rabbit?'
  print("encryption: " + str(cipher(seq)))
  print("Decryption: " + str(cipher(cipher(seq))))

if __name__ == '__main__':
    main()
encryption: Ih gsv liwvi z izyyrg?
Decryption: Is the order a rabbit?

I used islower () to determine if all case-sensitive characters are lowercase, and used ord to make characters → ascii and chr to make ascii → characters. I couldn't think of a particularly good English sentence, so I decided to treat it appropriately. ~~ I like the people inside Cocoa ~~

  1. Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

09.py


import random

def Typoglycemia(seq):
    return " ".join((x[0] + "".join(random.sample(x[1:-1], len(x[1:-1]))) + x[-1]) if len(x) > 4 else x for i,x in enumerate(seq.split()))

def main():
    s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind."
    print(Typoglycemia(s))

if __name__ == '__main__':
    main()
I coludn't bevelie that I cuold aultalcy urnteasndd what I was riednag : the pmenneaohl peowr of the hamun mdni.

Characters other than the beginning and end of the character string were randomly rearranged with random.sample without duplication.

(After editing ↓) Typoglycemia was rewritten.

09.py


import random

def Typoglycemia(seq):
    shuffle = lambda x: "".join(random.sample(x, len(x)))
    typo = lambda x: x[0] + shuffle(x[1:-1]) + x[-1]
    return " ".join(typo(x) if len(x) > 4 else x for x in seq.split())

def main():
    s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind."
    print(Typoglycemia(s))

if __name__ == '__main__':
    main()

lambda expression is convenient. I will use it when it seems to be usable in the future.


I will write the continuation when I feel like it.

Recommended Posts

Python beginner tried 100 language processing knock 2015 (05 ~ 09)
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
100 Language Processing with Python Knock 2015
100 Language Processing Knock Chapter 1 (Python)
I tried 100 language processing knock 2020
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock (2020): 28
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock (2020): 38
I tried 100 language processing knock 2020: Chapter 1
100 language processing knock 00 ~ 02
100 Language Processing Knock Chapter 1 by Python
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 Chapter 1
100 language processing knock 2020 [00 ~ 49 answer]
Python: Natural language processing
100 Language Processing Knock-52: Stemming
100 Language Processing Knock Chapter 1
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
Python inexperienced person tries to knock 100 language processing 14-16
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 Language Processing Knock-51: Word Clipping
100 Language Processing Knock-58: Tuple Extraction
100 language processing knock-50: sentence break
100 Language Processing Knock-87: Word Similarity
100 language processing knock-56: co-reference analysis
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")
100 Amateur Language Processing Knock: Summary
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
100 language processing knock-76 (using scikit-learn): labeling
100 language processing knock-55: named entity extraction
100 Language Processing Knock-82 (Context Word): Context Extraction
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
Language processing 100 knock-86: Word vector display
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 Language Processing Knock-28: MediaWiki Markup Removal
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
100 Language Processing Knock-59: Analysis of S-expressions
100 Language Processing Knock-31 (using pandas): Verb
100 language processing knock 2020 "for Google Colaboratory"