Aidemy 2020/10/30
Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This time, I will post a memo of stock price forecast 1. Nice to meet you.
What to learn this time ・ Check the flow of stock price forecasts ・ ① Get tweets ・ ② Sentiment analysis (negative / positive analysis)
-There are Technical analysis and Fundamental analysis, and this time we will perform technical analysis. The flow is as follows. (In this Chapter, do ① and ②) (1) Use the Twitter API to get the past tweets of a certain account. (2) Perform daily __ tweet sentiment analysis (negative / positive analysis) __ with the polarity dictionary. ③ Get __time series data of Nikkei Stock Average __. ④ Create a __model that predicts the ups and downs of the stock price on the next day from the daily sentiment.
・ First, register with TwitterAPI. ・ Registration method is omitted. ・ Register and get __ "Consumer Key", "Consumer Secret", "Access Token Secret", "Access Token" __, and use it to get tweets.
-Code![Screenshot 2020-10-17 17.17.36.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/126526e1-9450-c2db- 5e75-dfbda9b2f3cd.png)
-The 'python' part of the above code __ "res" __ is the keyword to be acquired.
-Code![Screenshot 2020-10-17 17.24.56.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/eb60f385-4acc-774c- b11e-7d4bb2cbeb06.png)
-The '@nikkei_bizdaily' part of the above code "tweets" is the account to be acquired. (@Nikkei_bizdaily is an account of "Nikkei Sangyo Shimbun")
-Sentiment analysis (negative / positive analysis) is to judge whether __text has a positive meaning or a negative meaning by natural language processing __. -The criteria for judgment are included in the __ "polarity dictionary" __. Sentiment analysis is performed by referring to the dictionary for each word in the text. -This time, we will use __ "Word Emotion Polarity Correspondence Table" __ as the polarity dictionary. This is a word applied to a value from -1 to 1 (PN value) with reference to the "Iwanami Japanese Dictionary (Iwanami Shoten)". A larger value indicates a positive meaning, and a smaller value indicates a negative meaning.
I. Get tweets and convert them to __DataFrame __. (See stock price forecast procedure ①) Ii. Import polarity dictionary / DataFrame conversion. Ⅲ. _ Morphological analysis with MeCab __. Iv. Get the __PN value from the polarity dictionary for each morphologically analyzed word __ and add it to the dictionary. V. Calculate the average __PN value for each tweet . Ⅵ . Standardization __ is performed, and the change in the PN value is displayed on the __ graph __.
-Import the polarity dictionary with __pd.read_csv () __. In the "Word Emotion Polarity Correspondence Table", specify __ "names = ('Word','Reading','POS','PN')" __ to read the word or PN value. -The word part of the polarity dictionary is listed and stored in word_list and the PN value part pn_list, and the dictionary is also created (pn_dict).
·code
-Since the tweet cannot be passed to the polarity dictionary as it is, it is divided into words by morphological analysis. Here, you define and use the function __ "get_diclist" __. -Morphological analysis is performed with MeCab. Also, since the analyzed data is broken for each word, it is listed separately for each line __. -Also, the last two lines are unnecessary, so delete them. -Since each split line is separated from the tab, split it completely with __re.split () __ and add it to the list word by word. (Dic_list)
·code
-When the polarity dictionary can be referred by morphological analysis, __ add the PN value of the polarity dictionary to the dict data for each word . This is used by defining the function "add_pnvalue". -Search the dictionary for the basic word form __ "'BaseForm'" that can be obtained with the __ "get_diclist" __ created in the previous section, and if it is in the dictionary, obtain it in that form, and if it is not in the dictionary, _ Add each as 'notfound'_ to an empty list called diclist_new.
-Code![Screenshot 2020-10-20 12.16.09.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/2930203a-1917-18c9- bee1-d40ef901380e.png)
-From the list returned by diclist_new above, find the average of __PN values __. ・ In calculating the average, the part that was'not found'in the previous section is excluded from the calculation of the PN value. Also, if there is nothing in pn_list at the time of calculation, it will be calculated with an average of 0 (because an error will occur if no value is entered).
・ Code![Screenshot 2020-10-20 11.13.32.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/fa7c1dac-8bb0-7e65- d938-8e76b71e6e9d.png)
-Finally, the __PN value is graphed and visualized __. However, if it is left as it is, the result of the __ graph will change depending on whether there are many positive or negative meanings in the entire polarity dictionary, so adjust the result by performing standardization. -Create the graph with __plt.plot () __. The vertical axis is __ "average of pn values" __, and the horizontal axis is __ "date" __. At this time, the part 'text', which is information other than the PN value, is not necessary for the graph, so delete it.
・ (Review) Standardization method (Unsupervised learning 3): __ (Difference between data and mean) ÷ Standard deviation __ X = (X - X.mean(axis=0))/X.std(axis=0)
-Code (when standardizing the average "means_list" of the pn value of a certain tweet df_tweets)
・ Graph![Screenshot 2020-10-20 11.57.53.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/80c50519-c43f-7dbc- 4860-5899f7e19aa3.png)
-Stock price forecasting is performed in the flow of "acquisition of tweets", "sentiment analysis (negative / positive analysis)", "acquisition of time series data of stock prices", and "creation of stock price forecast model". ・ You can get tweets by registering with the Twitter API. ・ Refer to "Negative / Positive Analysis" for how to analyze emotions.
This time is over. Thank you for reading until the end.
Recommended Posts