nlp03.py
#! usr/bin/env python
from collections import Counter
str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics'
li = []
count = Counter(map(len,str.split())).most_common()
for i in range(len(count)):
li.append(count[i][0])
print(li)
Execution result [9, 1, 3, 5, 7, 2, 4, 6, 8]
I didn't know how to implement it without using a for loop.
nlp03re.py
#!usr/bin/env python
seq = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
seq = seq.replace(",","").replace(".","")
words = seq.split()
count =[]
for i in words:
count.append(len(i))
print count
Execution result [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]
It became pi. I think there is a better way to write the part that omits "," and ".", But ...
nlp04.py
#!usr/bin/env python
str = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
number = [1, 5, 6, 7, 8, 9, 15, 16, 19]
dict = {}
strsp = str.split()
for i in range(len(strsp)):
word = strsp[i]
if i in number:
dict[word[0:2]] = i
else:
dict[word[0:1]] = i
print(dict)
Execution result {'A': 17, 'B': 4, 'Co': 5, 'No': 6, 'H': 0, 'K': 18, 'Cl': 16, 'M': 11, 'L': 2, 'Ne': 9, 'P': 14, 'S': 13, 'Ox': 7, 'N': 10, 'Fl': 8, 'Ca': 19, 'Se': 15, 'He': 1}
nlp04.py
#!usr/bin/env python
str = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
number = [1, 5, 6, 7, 8, 9, 15, 16, 19]
dict = {}
words = str.replace(","," ").replace("."," ").split()
for (i,word) in enumerate(words,1):
if i in number:
dict[word[0:1]] = i
else:
dict[word[0:2]] = i
print(dict)
Execution result {'Be': 7, 'C': 5, 'B': 5, 'Ca': 3, 'F': 8, 'S': 8, 'H': 2, 'K': 4, 'Al': 4, 'Mi': 5, 'Ne': 3, 'O': 7, 'Li': 4, 'P': 5, 'Si': 4, 'Ar': 6, 'Na': 7, 'N': 3, 'Cl': 6, 'He': 2}
05. n-gram Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".
nlp05.py
#! usr/bin/env python
def word_ngram(n,seq):
li = []
for i in range(len(seq.split())+1-n):
li.append(seq.split()[i:i+n])
return li
def char_ngram(n,seq):
li = []
for i in range(len(seq)):
li.append(seq[i:i+n])
return li
str = "I am an NLPer"
print(word_ngram(2,str))
print(char_ngram(2,str))
Execution result [['I', 'am'], ['am', 'an'], ['an', 'NLPer']] ['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er', 'r']
The character bigram considers a space as one character.
Recommended Posts