100 amateur language processing knocks: 05

It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).

Chapter 1: Preparatory movement

05.n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

The finished code:

`main.py`


# coding: utf-8


def n_gram(target, n):
	'''N from the specified list-Create gram

argument:
	target --Target list
	n -- n-gram n value (1 is uni-gram, 2 for bi-gram...）
Return value:
List of gram
	'''
	result = []
	for i in range(0, len(target) - n + 1):
		result.append(target[i:i + n])

	return result


target = 'I am an NLPer'
words_target = target.split(' ')

#Word bi-gram
result = n_gram(words_target, 2)
print(result)

#Character bi-gram
result = n_gram(target, 2)
print(result)

Execution result:

`Terminal`


[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

Also uni-gram and tri-gram

I also checked uni-gram and tri-gram to test the function.

`main.Continuation of py`


#Word uni-gram
result = n_gram(words_target, 1)
print(result)

#Character uni-gram
result = n_gram(target, 1)
print(result)

#Word tri-gram
result = n_gram(words_target, 3)
print(result)

#Character tri-gram
result = n_gram(target, 3)
print(result)

Execution result:

`Terminal`


[['I'], ['am'], ['an'], ['NLPer']]
['I', ' ', 'a', 'm', ' ', 'a', 'n', ' ', 'N', 'L', 'P', 'e', 'r']
[['I', 'am', 'an'], ['am', 'an', 'NLPer']]
['I a', ' am', 'am ', 'm a', ' an', 'an ', 'n N', ' NL', 'NLP', 'LPe', 'Per']

Sounds okay.

That's all for the sixth knock. If you have any mistakes, I would appreciate it if you could point them out.

Recommended Posts

100 amateur language processing knocks: 41

100 amateur language processing knocks: 71

100 amateur language processing knocks: 56

100 amateur language processing knocks: 50

100 amateur language processing knocks: 59

100 amateur language processing knocks: 70

100 amateur language processing knocks: 62

100 amateur language processing knocks: 60

100 amateur language processing knocks: 30