It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).
05.n-gram
Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".
The finished code:
main.py
# coding: utf-8
def n_gram(target, n):
'''N from the specified list-Create gram
argument:
target --Target list
n -- n-gram n value (1 is uni-gram, 2 for bi-gram...)
Return value:
List of gram
'''
result = []
for i in range(0, len(target) - n + 1):
result.append(target[i:i + n])
return result
target = 'I am an NLPer'
words_target = target.split(' ')
#Word bi-gram
result = n_gram(words_target, 2)
print(result)
#Character bi-gram
result = n_gram(target, 2)
print(result)
Execution result:
Terminal
[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']
I also checked uni-gram and tri-gram to test the function.
main.Continuation of py
#Word uni-gram
result = n_gram(words_target, 1)
print(result)
#Character uni-gram
result = n_gram(target, 1)
print(result)
#Word tri-gram
result = n_gram(words_target, 3)
print(result)
#Character tri-gram
result = n_gram(target, 3)
print(result)
Execution result:
Terminal
[['I'], ['am'], ['an'], ['NLPer']]
['I', ' ', 'a', 'm', ' ', 'a', 'n', ' ', 'N', 'L', 'P', 'e', 'r']
[['I', 'am', 'an'], ['am', 'an', 'NLPer']]
['I a', ' am', 'am ', 'm a', ' an', 'an ', 'n N', ' NL', 'NLP', 'LPe', 'Per']
Sounds okay.
That's all for the sixth knock. If you have any mistakes, I would appreciate it if you could point them out.
Recommended Posts