http://www.cl.ecei.tohoku.ac.jp/nlp100/ Has been renewed, and it seems that the 2015 version has been released.
I tried using my favorite python (2 series). I know there are a lot of similar articles, but (and I see) http://qiita.com/tanaka0325/items/08831b96b684d7ecb2f7 It is also open to the public as a memo of your progress + sharing. If you have any suggestions, thank you.
I want to continue after Chapter 2 ... "Rehabilitation" is meaningful only if it continues!
Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).
Honestly
text = "stressed"
text_reverse = ""
n = len(text)
for i in xrange(n):
text_reverse += text[n-i-1]
print text_reverse
#>>> desserts
Get from the end to the beginning honestly using subscripts. It also works when text = "".
revised edition
text = "stressed"
n = len(text)
text_reverse_list = [text[n-i-1] for i in xrange(n)]
text_reverse = ''.join(text_reverse_list)
print text_reverse
#>>> desserts
Postscript: It seems that connecting to a character string with a for loop is not good in terms of execution speed and memory. So I referred to the method of "creating a list of character strings → connecting with join".
slice
text = "stressed"
text_reverse = text[::-1]
print text_reverse
#>>> desserts
Simple with slices. String object [Start index: End index: Step]
Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.
python
text = u"Patatoku Kashii"
text_concat = text[0] + text[2] + text[4] + text[6]
print text_concat
#>>>Police car
I have given the string as unicode.
python
text = u"Patatoku Kashii"
text_concat = text[::2]
print text_concat
#>>>Police car
I see, can you do the same with slices?
Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.
python
text1 = u"Police car"
text2 = u"taxi"
text_concat = ""
m = len(text1)
n = len(text2)
for i in xrange(m):
if i<n:
text_concat += text1[i] + text2[i]
if i == m-1:
text_concat += text2[i+1:]
else:
text_concat += text1[i:]
break
print text_concat
#>>>Patatoku Kashii
Let's look at the two strings from the beginning. This is not a direct issue, but Consider the case where text1 and text2 are not the same length (m! = N). At this time, when one is finished, The other decided to concatenate the strings after that.
Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
python
sentence = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
word_length = [len(x.strip(',.')) for x in sentence.split()]
print word_length
#>>> [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]
--List comprehension
I like python because it can be smartly one-liner like this.
Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.
python
sentence = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
word_list = [x.strip(',.') for x in sentence.split()]
word_dict = dict()
specified = [1, 5, 6, 7, 8, 9, 15, 16, 19]
for i, word in enumerate(word_list):
if i in [x-1 for x in specified]:
word_dict[word[:1]] = i+1
else:
word_dict[word[:2]] = i+1
print word_dict
#>>> {'Be': 4, 'C': 6, 'B': 5, 'Ca': 20, 'F': 9, 'S': 16, 'H': 1, 'K': 19, 'Al': 13, 'Mi': 12, 'Ne': 10, 'O': 8, 'Li': 3, 'P': 15, 'Si': 14, 'Ar': 18, 'Na': 11, 'N': 7, 'Cl': 17, 'He': 2}
print word_dict['Be']
#>>> 4
I put i + 1 in word_dict to make it consistent with the "th" given in the problem. Well, I noticed after solving it, but did I get the "atomic number, that is, the number of protons"? I miss "Suihe, Ribe, my boat".
I think ... Magnesium is'Mg'?
python
'Mi': 12
Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".
python
def n_gram(sentence_str, n, type):
result = set()
if type == 'word':
words = [x.strip(',.') for x in sentence_str.split()]
elif type == 'letter':
words = sentence_str
m = len(words)
for i in xrange(m-n+1):
result.add(tuple(words[i:i+n]))
return result
#If you also want frequency information
from collections import defaultdict
def n_gram_freq(sentence_str, n, type):
result = defaultdict(int)
if type == 'word':
words = [x.strip(',.') for x in sentence_str.split()]
elif type == 'letter':
words = sentence_str
m = len(words)
for i in xrange(m-n+1):
result[tuple(words[i:i+n])] += 1
return result
sentence_str = "I am an NLPer"
#sentence_list = ['I', 'am', 'an', 'NLPer']
print n_gram(sentence_str, 2, 'word')
#>>> set([('am', 'an'), ('an', 'NLPer'), ('I', 'am')])
print n_gram(sentence_str, 2, 'letter')
#>>> set([('N', 'L'), ('m', ' '), ('e', 'r'), ('a', 'n'), ('I', ' '), ('n', ' '), ('L', 'P'), (' ', 'N'), (' ', 'a'), ('a', 'm'), ('P', 'e')])
print n_gram_freq(sentence_str, 2, 'word')
#>>> defaultdict(<type 'int'>, {('am', 'an'): 1, ('an', 'NLPer'): 1, ('I', 'am'): 1})
print n_gram_freq(sentence_str, 2, 'letter')
#>>>defaultdict(<type 'int'>, {('N', 'L'): 1, ('m', ' '): 1, ('e', 'r'): 1, ('a', 'n'): 1, ('I', ' '): 1, ('n', ' '): 1, ('L', 'P'): 1, (' ', 'N'): 1, (' ', 'a'): 2, ('a', 'm'): 1, ('P', 'e'): 1})
Take a string as an argument. Allows you to select the word n-gram and the character n-gram with another argument type.
Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.
python
str_x = "paraparaparadise"
str_y = "paragraph"
X = n_gram(str_x, 2, 'letter')
Y = n_gram(str_y, 2, 'letter')
print X.union(Y) #Union. X|Synonymous with Y
#>>>set([('g', 'r'), ('p', 'h'), ('p', 'a'), ('s', 'e'), ('a', 'p'), ('a', 'g'), ('a', 'd'), ('i', 's'), ('r', 'a'), ('a', 'r'), ('d', 'i')])
print X.intersection(Y) #Intersection. X&Synonymous with Y
#>>>set([('a', 'p'), ('r', 'a'), ('p', 'a'), ('a', 'r')])
print X.difference(Y) #The difference set. X-Synonymous with Y
#>>>set([('a', 'd'), ('s', 'e'), ('d', 'i'), ('i', 's')])
tuple('se') in X
#>>> True
tuple('se') in Y
#>>> False
Use the function defined in the previous question. In the definition of the function, the given'se' is converted to tuple and the condition is judged.
Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.
python
from string import Template
def generate_sentence(x,y,z):
t = Template("${x_tmpl}of time${y_tmpl}Is${z_tmpl}")
return t.safe_substitute(x_tmpl=x, y_tmpl=y, z_tmpl=z)
def generate_sentence_incomplete(x,y,z):
t = Template("${x_tmpl}of time${y_tmpl}Is${z_tmpl}")
return t.safe_substitute(x_tmpl=x, y_tmpl=y)
x, y, z =12, "temperature", 22.4
print generate_sentence(x,y,z)
#>>>The temperature at 12:00 is 22.4
print generate_sentence_incomplete(x,y,z)
#>>>The temperature at 12 o'clock${z_tmpl}
Since I used Template.safe_substitute (), http://docs.python.jp/2/library/string.html#string.Template.safe_substitute
Same as substitute (), but instead of throwing a KeyError exception, the original placeholder is included as is if the mapping or kws cannot find a corresponding placeholder.
Implement the function cipher that converts each character of the given character string with the following specifications. Replace with lowercase letters (219 --character code) Output other characters as they are Use this function to encrypt / decrypt English messages.
python
def cipher(input_str):
result = ""
for letter in input_str:
#If lowercase letters
if letter.isalpha() and letter.islower():
result += chr(219-ord(letter))
else:
result += letter
return result
english_message = "This is a pen."
#encryption
print cipher(english_message)
#>>> Tsrh rh z kvm.
#Decryption
print cipher(cipher(english_message))
#>>> This is a pen.
I didn't know this subject, but it seems to be called Atbash cipher. http://www.mitsubishielectric.co.jp/security/learn/info/misty/stage1.html
Cryptography was also used in the Old Testament. One of them is the Hebrew substitution cipher Atbash. This cipher is made by numbering the letters and swapping the order from the beginning and the order from the end. If you want to encrypt the 26 letters of the alphabet, change the order of A to Z, B to Y, and so on.
That's why encryption and decryption can be achieved with the same function. (You can get the original string by applying the same function twice).
Click here for the built-in functions used.
Built-in functions
str.isalpha()
str.islower()
ord()
chr()
unichr()
# http://docs.python.jp/2.6/library/functions.html#ord
# http://docs.python.jp/2.6/library/functions.html#chr
# http://docs.python.jp/2.6/library/functions.html#unichr
Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.
python
import random
def random_shuffled_str(input_str):
input_list = list(input_str)
random.shuffle(input_list)
result = ''.join(input_list)
return result
def typoglycemia(sentence):
str_list = sentence.split()
result_list = [x[0]+ random_shuffled_str(x[1:-1]) +x[-1] if len(x) > 4 else x for x in str_list]
return ' '.join(result_list)
message = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print typoglycemia(message)
#>>> I cunl'dot beveile that I cloud aallucty unsrndtead what I was rdaineg : the phaenmneol pweor of the huamn mind .
print typoglycemia(message)
#>>> I cu'dolnt bivelee that I cloud aculltay udetnnasrd what I was ridneag : the pheonaenml peowr of the hamun mind .
When you want to use else in list comprehension
python
if len(x) > 4 else x
I'll bring it to this position ... And random.shuffle.
--Directly change the list (etc.) given as an argument. --So I defined the function so that the changed result can be obtained in the list comprehension.
http://docs.python.jp/2/library/random.html
random.shuffle(x[, random]) Mix sequence x by direct modification. The optional argument random is a function that has no arguments to return a random floating point number in the range [0.0, 1.0]; by default, this function is random ().
Note that even with a fairly small len (x), the permutations of x will be larger than the period of most random number generators; this means that most permutations will not be generated for long sequences. Means.
Recommended Posts