Predict stock prices by big data analysis from past data

Today, for the first time in 15 years, the Nikkei average has temporarily recovered to the 19,000 yen level [http://jp.reuters.com/article/topNews/idJPKBN0M80QN20150312), and it is expected that it will reach 20,000 yen by the end of June. There are voices, but in the meantime, it is a story of analyzing stocks by big data (laugh) analysis.

Efficient market hypothesis

There is an efficient market hypothesis in the financial world, and there is a theory that it is impossible to continue to perform better than others by using any information. I don't think there is anything more misunderstood or conveniently interpreted.

You should read this area around Efficient Market Hypothesis Paradox.

Thinking normally, for example, why dealers and fund managers in the securities industry can keep their jobs, why everyone imitates Buffett and not everyone gets rich. It seems to be understandable.

The pros and cons are [around here](http://wag-study-abroad.com/wordpress/blog/2010/10/23/%E3%80%90%E3%83%95%E3%82%A1%E3 % 82% A4% E3% 83% 8A% E3% 83% B3% E3% 82% B9% E3% 80% 91% E5% 8A% B9% E7% 8E% 87% E7% 9A% 84% E5% B8 % 82% E5% A0% B4% E4% BB% AE% E8% AA% AC% E3% 81% A8% E3% 81% 9D% E3% 81% AE% E5% 8F% 8D% E8% AB% 96 If you read /) (by the way, I am close to Professor Andrew Lo's idea), it does not deny the usefulness of technical analysis anyway, but rather it is the basis and extremely important in the analysis of the trading market. ..

While everyone is bleeding and analyzing financial statements and technical analysis, it is possible to find stocks and securities that are cheaper than the original price and sell them at a higher price than the original price. Is it? This is very difficult because everyone is desperate. But that doesn't mean you don't have to look at technical indicators.

This is, to the extreme, a general story that works in business, trying to differentiate by having something that other rival companies do not have as a strength (opening up new markets), and gradually all companies will do the same. There is something that leads to the story that competition is born or intensified (commoditization), and that price decline and exhaustion are waiting at the end of intensified competition (Red Ocean), so I think it can be generalized.

In other words, the reason for analyzing data is to find and use materials to make a difference and win against rivals.

Acquisition of data to be analyzed

It's been a long time since the word big data became a buzzword, but now that there are sites like Yahoo! Finance, it's easy to collect past stock price data. Moreover, the performance of computers has improved, free statistical languages (R, Python, etc.) can be freely used by anyone, and knowledge such as statistics and machine learning is overflowing with information on the Internet, so past data can be used. On the other hand, the threshold is getting lower and anyone can do statistical analysis to find out the tendency.

In fact, more and more active fund managers and analysts are looking at statistics from stock data for the past 15 years, for example, to find out what kind of situation is likely to occur in what situation. It's coming. Not investigating historical data in such an era is what sets us apart from our rivals.

However, technical indicators and statistics are not absolute. Technical indicators only clarify the target numerically, and humans have no choice but to think about the causal relationship between them. I think this story is common to the story in statistics that what is the cause and what is the result is not always clear in the real world.

Getting stock price data in Japan with Ruby I wrote the story before, but since the analysis is done with Python, both Python and Ruby Let's make it common so that data is acquired in the same format.

The sample code to get the stock price is available at here. If you execute this with the stock code as the first argument and the date (2015-01-01, etc.) as the second argument, the stock price data frame will be saved as a .csv file in the following format.


,Open,High,Low,Close,Volume,Adj Close
2015-03-11,2300.0,2329.0,2289.0,2299.0,106100.0,2299.0

The most important of these is Adj Close ([Adjusted closing price](http://www.yahoo-help.jp/app/answers/detail/p/546/a_id/45316/~/%E8%AA%BF%E6%] 95% B4% E5% BE% 8C% E7% B5% 82% E5% 80% A4% E3% 81% A8% E3% 81% AF)). Since ensuring the time series continuity of the closing price is essential in time series data analysis, it is an essential and important data item in stock analysis so that it is said that a tool that does not output the adjusted closing price cannot be used.

Key indicators of technical analysis and calculation by pandas

Before conducting a stock analysis, let's first hold down the basic technical indicators.

There are many sites that explain this, but if you choose subjectively, the kabu.com Securities site is easy to understand.

Part 1 Systematic explanation of many technical indicators http://kabu.com/investment/guide/technical/01.html

Roughly speaking, there are "trend-based indicators" for grasping trends and "oscillator-based indicators" for detecting movements that are different from usual. Which one to refer to depends on the person, and it will be displayed if you use the tools on the street, but it is a good idea which number is derived by what basis (= mathematical formula). It is better to put it in. If you have any doubts, you need to be willing to calculate and verify by yourself.

Candlestick and moving average

I wrote about the most representative candlesticks and moving averages of trend indicators earlier as Candlestick chart and moving average plot. Let's plot the stock information of NTT Data, a company similar to ours.

This time, I plotted the exponential smoothing moving average in units of 5, 25, 75 days.

RSI（Relative Strength Index）

Next, in order to detect changes in trends, we will seek the representative RSI among oscillator-based indicators. The calculation method is explained in the link, so I will omit it.

def calc_rsi(price, n=14):
    gain = (price - price.shift(1)).fillna(0)

    def rsiCalc(p):
        avgGain = p[p > 0].sum() / n
        avgLoss = -p[p < 0].sum() / n
        rs = avgGain / avgLoss
        return 100 - 100 / (1 + rs)

    return pd.rolling_apply(gain, n, rsiCalc)

It looks like this. This indicator is generally considered overbought above 70 and oversold below 30. These indicators will be easier to see if you combine them with the basic candlestick chart and use the subplot to create a two-tiered structure (http://qiita.com/ynakayama/items/68eff3cb146181329b48).

Find the movement correlation between the price movement of a stock and the Nikkei Stock Average

So far, I have searched for very representative technical indicators. If you just want to calculate and display these, you can easily install and use a lot of useful stock software etc. in the street.

As mentioned above, what is important is the perspective of analyzing the target from various angles from a perspective that many rival investors do not have.

For example, the Nikkei average has risen by 267 yen today, but not all stocks are highly correlated with the Nikkei average. Therefore, you may be wondering how much the stocks you are interested in are linked to the Nikkei 225. To investigate this, let's calculate the movement correlation between the rate of increase and decrease of each stock and the Nikkei average.

def rolling_corr_with_N225(stock, window=5):
    d1 = pd.read_csv("".join(["stock_", stock, ".csv"]), index_col=0, parse_dates=True)
    d2 = pd.read_csv("stock_N225.csv", index_col=0, parse_dates=True)
    s1 = d1.asfreq('B')['Adj Close'].pct_change().dropna()
    s2 = d2.asfreq('B')['Adj Close'].pct_change().dropna()
    rolling_corr = pd.rolling_corr(s1, s2, window).dropna()

    return rolling_corr

First, let's find and visualize the movement correlation between GungHo's stock price and the Nikkei 225, which are also known to have made great strides in social games.

The closer it is to 1, the more correlated it is, but as you can see, the correlation between social game brands and the Nikkei 225 is not so high.

On the other hand, in the case of NTT DATA, please try the same.

I see, this seems to have a high correlation.

I was able to analyze the characteristics of each brand a little.

Example of stock price fluctuation forecast by big data analysis

Here, as a case study, I would like to post the predictions and results of an analyst. This is an example of Sumitomo Dainippon Pharma (4506) on Monday, March 9th.

[Sumitomo Dainippon Pharma: 4,057 times in 15 years, a chart with a total price drop of 5149.70%](http://info.finance.yahoo.co.jp/kabuyoso/article/detail/20150309-00020455-minkabuy-stocks- 5078? highlight =% E5% A4% A7% E6% 97% A5% E6% 9C% AC% E4% BD% 8F% E5% 8F% 8B)

On Friday, March 6th, last weekend, the stock price rose 12.49% in one day and jumped to the top of the rate of increase. On this day, Nomura Securities raised the target stock price with the investment decision of this stock at the highest level.

This analyst seems to be good at analyzing stock data for the past 10 years by statistical methods, but as you can see in the page linked above, similar charts are 2,517 out of 4,057 times. The price also fell, and the probability that the stock price fell during the day was 62.04%, so I considered it dangerous.

(Posted on paper at 8:44, before approaching on March 9)

As a result, on March 9, the initial price suddenly hit a high of 1,600 yen, and then the closing price dropped to 1,488 yen, so the forecast was a great answer.

In a sense, big data analysis can be said to analyze trends from past data and predict and guess immediate price movements.

Digression

In the securities industry, the "Japanese Big Bang" advocated by then Prime Minister Hashimoto in 1996, and then deregulation in 1999 completely liberalized stock brokerage fees. In particular, with the rise of online securities with this complete liberalization, fees have fallen to the lowest level among developed countries, and many investors, including individuals, are now mainstream in online trading.

Computer resources have also become cheaper, and statistical analysis software, which was once expensive, can now be replaced by programming languages such as R and Python that anyone can use for free.

It is often said that you have your own investment style in asset management, but there are pillars on which you can think when you are uncertain about what your investment style is. In such a case, technical analysis is very useful for making a calm decision without going back and forth on the spot. By eliminating emotional judgment and looking at the numbers calmly, you can prevent unexpected mistakes.

Of course, data analysis is not a silver bullet that prevents any contingency, but there is no reason not to analyze data in such an era. And I think this is not just about stock analysis, but also about formulating management strategies and the same for various businesses.