Graph the change in the number of keyword appearances per month using pandas

Premise

(Same as previous) Here is the Excel file. It is output from a certain DB, and sentences are stored in one record per line and one field. Each line also has a date information field. The theme of this time is to extract the specified keyword from the text in this field and plot how the number of appearances changes from month to month. The entrance and exit are Windows Excel files, and the middle is done on a Mac.

Character code conversion and Excel conversion are the same as last time, so they are omitted.

Preparation

Let df read csv by pd.read (). MeCab required

def group_by_month(df):
	e = df['comment']	#Specify a field with text
	e.index = pd.to_datetime(df['datetime'])	#Specify date information in index
	m = MeCab.Tagger('-Ochasen')	#Put the output in Chasen mode

	result_df = None
	for k, v in e.iteritems():
		if type(v) != unicode:
			continue
		target_dic = {		#Specify the target keyword
			'XXX'			: 0,
			'YYY'			: 0,
			'ZZZ'			: 0,
		}
		s8 = v.encode('utf-8')
		node = m.parseToNode(s8)
		while node:
			word=node.feature.split(',')[0]
			key = node.surface
			if key in target_dic:
				target_dic[key] += 1	#Increase the count if found
			node = node.next
		if result_df is None:
			result_df = pd.DataFrame(target_dic, index=[k])
		else:
			result_df = result_df.append(pd.DataFrame(target_dic, index=[k]))
	#Monthly grouping
	result_df['index1'] = result_df.index
	result_df = result_df.groupby(pd.Grouper(key='index1', freq='M')).sum()
	#It doesn't seem to work with index, so put it in column
	return result_df

Every time I empty the dictionary, count the number of occurrences, convert it to a DataFrame and add it. I think it could be made simpler, but I don't know how to do it.

At this point, the following data will be stored in result_df.

            XXX YYY ZZZ
index1                
2014-06-30   0   1   0
2014-07-31   0   6   0
2014-08-31   3  19   6
2014-09-30   1   8   0
2014-10-31   5  29   7
2014-11-30  10   8   0
2014-12-31  10  31   8
2015-01-31  12  41  15
2015-02-28  45  82  22
2015-03-31  21  58   9
2015-04-30  23  60  19
2015-05-31   4  36   3
2015-06-30  11  40   8
2015-07-31  13  49  11
2015-08-31   8  14   2
2015-09-30  13  13   9
2015-10-31   5  31   9
2015-11-30  11  21   3
2015-12-31  12  21   3
2016-01-31   2  19   0
2016-02-29  12  15   5
2016-03-31   9  32   7
2016-04-30   2  22   4
2016-05-31   6  24   2
2016-06-30   7  21   4
2016-07-31   9  22   4
2016-08-31   5  21   1
2016-09-30   7  31   6
2016-10-31   0  12   1

plot

'''
Prepare the graph area
'''
def plot_init(title):
	fig = plt.figure()
	ax = fig.add_subplot(1,1,1)
	ax.set_title(title)
	return fig, ax

'''
Plot
'''
def plot_count_of_day(df):
	title = 'test_data'
	fig, ax = plot_init(title)
	for c in df.columns:
		df[c].plot(label=c, ax=ax)
	ax.legend()
	ax.set(xlabel='month', ylabel='count')

result

Like this.

test_data.png

end.

Recommended Posts

Graph the change in the number of keyword appearances per month using pandas
Graph of the history of the number of layers of deep learning and the change in accuracy
Generate a list packed with the number of days in the current month.
Output the number of CPU cores in Python
Change the font size of the legend in df.plot
Determine the number of classes using the Starges formula
[Python] Representing the number of complaints from life insurance companies in a bar graph
Bayesian inference concept (3) ... Calculation of change points in the number of emails received by PyMC3
How to get the number of digits in Python
Count the number of parameters in the deep learning model
Omit the decimal point of the graph scale in matplotlib
Get the size (number of elements) of UnionFind in Python
[In 3 lines] Plot the population pyramid (bar graph of age group / gender) with Pandas alone
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
How to find the optimal number of clusters in k-means
Maya | Find out the number of polygons in the selected object
Examine the margin of error in the number of deaths from pneumonia
Analyzing data on the number of corona patients in Japan
Have the equation graph of the linear function drawn in Python
Count the number of characters in the text on the clipboard on mac
Get the number of specific elements in a python list
Python --Find out number of groups in the regex expression
[Homology] Count the number of holes in data with Python
Change the log retention period of CloudWatch Logs in Lambda
[Nonparametric Bayes] Estimating the number of clusters using the Dirichlet process
Get the number of occurrences for each element in the list
Graph time series data in Python using pandas and matplotlib
Maximum number of characters in Python3 shell call (per OS)
10. Counting the number of lines
Get the number of digits
Calculate the number of changes
Change the theme of Jupyter
Change the style of matplotlib
Connected components of the graph
The Power of Pandas: Python
I checked the distribution of the number of video views of "Flag-chan!" [Python] [Graph]
Count the number of Thai and Arabic characters well in Python
How to change the color of just the button pressed in Tkinter
[Python] Let's reduce the number of elements in the result in set operations
Feel free to change the label of the legend in Seaborn in python
[TensorFlow 2] How to check the contents of Tensor in graph mode
Seaborn basics for beginners ① Aggregate graph of the number of data (Countplot)
Using TensorFlow in the cloud integrated development environment Cloud9 ~ Basics of usage ~
Get the number of readers of a treatise on Mendeley in Python