Premise

(Same as previous) Here is the Excel file. It is output from a certain DB, and sentences are stored in one record per line and one field. Each line also has a date information field. The theme of this time is to extract the specified keyword from the text in this field and plot how the number of appearances changes from month to month. The entrance and exit are Windows Excel files, and the middle is done on a Mac.

Character code conversion and Excel conversion are the same as last time, so they are omitted.

Preparation

Let df read csv by pd.read (). MeCab required

def group_by_month(df):
	e = df['comment']	#Specify a field with text
	e.index = pd.to_datetime(df['datetime'])	#Specify date information in index
	m = MeCab.Tagger('-Ochasen')	#Put the output in Chasen mode

	result_df = None
	for k, v in e.iteritems():
		if type(v) != unicode:
			continue
		target_dic = {		#Specify the target keyword
			'XXX'			: 0,
			'YYY'			: 0,
			'ZZZ'			: 0,
		}
		s8 = v.encode('utf-8')
		node = m.parseToNode(s8)
		while node:
			word=node.feature.split(',')[0]
			key = node.surface
			if key in target_dic:
				target_dic[key] += 1	#Increase the count if found
			node = node.next
		if result_df is None:
			result_df = pd.DataFrame(target_dic, index=[k])
		else:
			result_df = result_df.append(pd.DataFrame(target_dic, index=[k]))
	#Monthly grouping
	result_df['index1'] = result_df.index
	result_df = result_df.groupby(pd.Grouper(key='index1', freq='M')).sum()
	#It doesn't seem to work with index, so put it in column
	return result_df

Every time I empty the dictionary, count the number of occurrences, convert it to a DataFrame and add it. I think it could be made simpler, but I don't know how to do it.

At this point, the following data will be stored in result_df.

            XXX YYY ZZZ
index1                
2014-06-30   0   1   0
2014-07-31   0   6   0
2014-08-31   3  19   6
2014-09-30   1   8   0
2014-10-31   5  29   7
2014-11-30  10   8   0
2014-12-31  10  31   8
2015-01-31  12  41  15
2015-02-28  45  82  22
2015-03-31  21  58   9
2015-04-30  23  60  19
2015-05-31   4  36   3
2015-06-30  11  40   8
2015-07-31  13  49  11
2015-08-31   8  14   2
2015-09-30  13  13   9
2015-10-31   5  31   9
2015-11-30  11  21   3
2015-12-31  12  21   3
2016-01-31   2  19   0
2016-02-29  12  15   5
2016-03-31   9  32   7
2016-04-30   2  22   4
2016-05-31   6  24   2
2016-06-30   7  21   4
2016-07-31   9  22   4
2016-08-31   5  21   1
2016-09-30   7  31   6
2016-10-31   0  12   1

plot

'''
Prepare the graph area
'''
def plot_init(title):
	fig = plt.figure()
	ax = fig.add_subplot(1,1,1)
	ax.set_title(title)
	return fig, ax

'''
Plot
'''
def plot_count_of_day(df):
	title = 'test_data'
	fig, ax = plot_init(title)
	for c in df.columns:
		df[c].plot(label=c, ax=ax)
	ax.legend()
	ax.set(xlabel='month', ylabel='count')

result

Like this.

end.

Graph the change in the number of keyword appearances per month using pandas

Premise

Preparation

plot

result