This article is written for beginners in equity investment.

When you have little investment experience

--Taking too much risk --Too much demand for high rate of return

Because of this, when you buy a stock and the price goes down, you tend to sell it immediately. And because of this mistake over and over again, the loss grows. If you have little experience, the stock price will drop too much no matter how many stocks you choose and the timing of buying and selling. Sometimes it works from the beginning, but it's just lucky and doesn't last long.

Therefore, stock price data analysis is important. And the results of the analysis based on the data of various stock markets in the world are simple.

--Long-term investment is basic --Invest in a long-term rising market --Take risks to the extent that you will not be thrilled in the down market --Do not sell until you make a profit

No one is thrilled by the drop in the price of the stocks they bought. It's hard for anyone to do that. Therefore, even if you lose money, you tend to sell it. Those who have this kind of experience need to rethink how to take risks. Let's reduce the risk. You should invest in stocks with less price fluctuations. If you still repeat the same experience, buying stock may not be appropriate. It may be better to invest in bonds with less price fluctuations. Of course, bonds will also fall, so you may have similar experiences. In such cases, it may be better to choose a bond with less price fluctuations or a cash deposit. This tendency of human beings cannot be overcome by the acquisition of experience and knowledge. It starts with knowing yourself.

If the price of a stock falls and you continue to hold it even if you are thrilled, you need to take a closer look at the nature of the stock. Because that's your risk tolerance. One of the guidelines at that time is the rate of return and price volatility. The rate of return is a measure of how much the price of a stock has risen in the past. Price volatility is a measure of how much a stock price fluctuates.

Let's actually learn how to analyze data. Python is useful for data analysis. We recommend that you install Jupyter notebook at that time. This article is written in Jupyter notebook. Also, the program code is written on the assumption that Jupyter notebook is used. You also need a pandas-datareader. For the installation of jupyter notebook, refer to System trade starting with Python3: Installing Jupyter notebook. There are also instructions for installing pandas-datareader here.

Please refer to I downloaded the stock price from Yahoo Finance US for how to use pandas-datareader.

Investment target

Stock selection is very difficult for beginners. Sometimes the company you bought goes bankrupt. So let's first look at a stock index that consists of various stocks. When analyzing long-term stock prices, it is essential to take the logarithm of the stock price. For more information, please refer to System Trading Starting with Python 3: The Role of Logarithms in System Trading.

ETF (Exchange Traded Fund)

When investing in a stock index, ETFs (Exchange Traded Funds) that can be bought and sold at any time and have high price transparency are the basics.

Dow Jones

The ETF of the Dow Jones Industrial Average, the longest-established US stock index, is "DIA".

%matplotlib inline 
import matplotlib.pyplot as plt #Drawing library
import pandas_datareader.data as web #Data download library
import numpy as np
import pandas as pd
import seaborn as sns
tsd = web.DataReader("dia","yahoo","1980/1/1").dropna()#jpy
np.log(tsd.loc[:,'Adj Close']).plot()

Nasdaq 100

The ETF of the stock index for NASDAQ, an emerging stock market that has achieved economic growth in the United States, is "QQQ".

tsd = web.DataReader("qqq","yahoo","1980/1/1").dropna()#jpy
np.log(tsd.loc[:,'Adj Close']).plot()

S&P500 The stock index benchmarked by the US Pension Fund is the S & P 500 and its ETF is "SPY".

tsd = web.DataReader("spy","yahoo","1980/1/1").dropna()#jpy
np.log(tsd.loc[:,'Adj Close']).plot()

You can see that they are all rising in the long run.

Other ETFs

ETF=['DIA','SPY','QQQ','IBB','XLV','IWM','EEM','EFA','XLP','XLY','ITB','XLU','XLF',
     'VGT','VT','FDN','IWO','IWN','IYF','XLK','XOP','USMV'] #Stock Index ETF
ETF2=['BAB','GLD','VNQ','SCHH','IYR','AGG','BND','LQD','VCSH','VCIT','JNK'] #ETFs other than stock indexes

Grasp the degree of excitement by the ratio of risk and return

What kind of movement does the stock price make you feel excited? It is important to know your own pattern. Knowing that is the starting point for stock investment. You can't let anyone else teach you this pattern. You have no choice but to find it yourself. And until you find that pattern, you should invest as little as possible.

Let me introduce one tool to find that pattern. That is the ratio of risk to return. Divide the annualized return by the annualized standard deviation (volatility). Let's actually see the movement after the Lehman shock.

m=[]#Saves annualized average data for each stock price
v=[]#Save the annualized standard deviation of each stock price
PORT=ETF
j=0
for i in range(len(PORT)):
    tsd=web.DataReader(PORT[i], "yahoo",'1980/1/1')#Download stock price data
    tsd2=tsd.loc['2010/1/1':]#After the Lehman shock
    tsd3=tsd.loc['1980/1/1':'2009/12/31']#Before Lehman shock
    if len(tsd3)>1000:
        lntsd=np.log(tsd2.iloc[:,5])#Take the natural logarithm of the data
        m.append((lntsd.diff().dropna().mean()+1)**250-1)
        v.append(lntsd.diff().dropna().std()*np.sqrt(250))
        print('{0: 03d}'.format(j+1),'{0:7s}'.format(PORT[i]),'average{0:5.2f}'.format(m[j]),
          'Volatility{0:5.2f}'.format(v[j]),'m/v {0:5.2f}'.format(m[j]/v[j]),
          'The number of data{0:10d}'.format(len(tsd)))
        j+=1
v_m=pd.DataFrame({'v':v,'m':m})
plt.scatter(v_m.v,v_m.m,color="g")
plt.ylabel('return')
plt.xlabel('volatility')

01 DIA average 0.12 Volatility 0.17 m/v  0.68 Number of data 5710
02 SPY Average 0.13 Volatility 0.17 m/v  0.73 Number of data 6966
03 QQQ Average 0.19 Volatility 0.20 m/v  0.97 Number of data 5424
04 IBB average 0.16 Volatility 0.24 m/v  0.65 Number of data 4937
05 XLV Average 0.14 Volatility 0.17 m/v  0.80 number of data 5476
06 IWM average 0.10 Volatility 0.22 m/v  0.43 Number of data 5116
07 EEM average 0.02 Volatility 0.23 m/v  0.09 Number of data 4395
08 EFA average 0.04 Volatility 0.19 m/v  0.20 Number of data 4801
09 XLP Average 0.11 Volatility 0.14 m/v  0.80 number of data 5476
10 XLY average 0.17 Volatility 0.19 m/v  0.93 Number of data 5476
11 XLU average 0.10 Volatility 0.18 m/v  0.57 Number of data 5476
12 XLF average 0.11 Volatility 0.25 m/v  0.45 Number of data 5476
13 VGT average 0.18 Volatility 0.21 m/v  0.88 number of data 4194
14 IWO Average 0.12 Volatility 0.23 m/v  0.53 Number of data 5073
15 IWN average 0.07 Volatility 0.22 m/v  0.30 Number of data 5073
16 IYF average 0.09 Volatility 0.21 m/v  0.42 Number of data 5116
17 XLK average 0.18 Volatility 0.20 m/v  0.88 number of data 5476

Next, let's take a look before the Lehman shock.

m2=[]#Saves annualized average data for each stock price
v2=[]#Save the annualized standard deviation of each stock price
PORT=ETF
j=0
for i in range(len(PORT)):
    tsd=web.DataReader(PORT[i], "yahoo",'1980/1/1')#Download stock price data
    tsd2=tsd.loc['1980/1/1':'2009/12/31']#Download stock price data
    if len(tsd2)>1000:
        lntsd=np.log(tsd2.iloc[:,5])#Take the natural logarithm of the data
        m2.append((lntsd.diff().dropna().mean()+1)**250-1)
        v2.append(lntsd.diff().dropna().std()*np.sqrt(250))
        print('{0: 03d}'.format(j+1),'{0:7s}'.format(PORT[i]),'average{0:5.2f}'.format(m2[j]),
          'Volatility{0:5.2f}'.format(v2[j]),'m/v {0:5.2f}'.format(m2[j]/v2[j]),
          'The number of data{0:10d}'.format(len(tsd2)))
        j+=1
v_m2=pd.DataFrame({'v2':v2,'m2':m2})
plt.scatter(v_m2.v2,v_m2.m2,color="g")
plt.ylabel('return')
plt.xlabel('volatility')

01 DIA average 0.04 Volatility 0.21 m/v  0.21 Number of data 3008
02 SPY Average 0.08 Volatility 0.20 m/v  0.38 Number of data 4264
03 QQQ Average-0.01 Volatility 0.34 m/v -0.02 Number of data 2722
04 IBB average-0.03 Volatility 0.30 m/v -0.09 Number of data 2235
05 XLV Average 0.03 Volatility 0.20 m/v  0.16 Number of data 2774
06 IWM average 0.05 Volatility 0.26 m/v  0.17 Number of data 2414
07 EEM average 0.23 Volatility 0.37 m/v  0.62 Number of data 1693
08 EFA average 0.05 Volatility 0.25 m/v  0.21 Number of data 2099
09 XLP Average 0.02 Volatility 0.17 m/v  0.11 Number of data 2774
10 XLY average 0.02 Volatility 0.26 m/v  0.09 Number of data 2774
11 XLU average 0.04 Volatility 0.22 m/v  0.18 Number of data 2774
12 XLF average-0.02 Volatility 0.36 m/v -0.06 Number of data 2774
13 VGT average 0.02 Volatility 0.24 m/v  0.11 Number of data 1492
14 IWO average-0.01 Volatility 0.28 m/v -0.02 Number of data 2371
15 IWN average 0.08 Volatility 0.26 m/v  0.30 Number of data 2371
16 IYF average-0.04 Volatility 0.33 m/v -0.11 Number of data 2414
17 XLK average-0.02 Volatility 0.31 m/v -0.07 Number of data 2774

The ratio tends to be relatively higher after the Lehman shock.

Next, let's compare the two periods.

plt.scatter(v_m.m/v_m.v,v_m2.m2/v_m2.v2)
plt.ylabel('2009-now')
plt.xlabel('1980-2009')

Actually, it would be nice if the relationship could draw a straight line rising to the right, but unfortunately there seems to be no such relationship. This is because the stock index is greatly affected by the collapse of the Internet bubble in 2000. The bursting of the bubble will greatly reduce the efficiency of investment. However, it cannot be avoided. Keep in mind that equity investments always carry unpredictable risks.

Next, let's analyze ETFs other than stock indexes. Gold, real estate REITs, and bond indexes.

m=[]#Saves annualized average data for each stock price
v=[]#Save the annualized standard deviation of each stock price
PORT=ETF2
j=0
for i in range(len(PORT)):
    tsd=web.DataReader(PORT[i], "yahoo",'1980/1/1')#Download data
    tsd2=tsd.loc['2010/1/1':]
    tsd3=tsd.loc['1980/1/1':'2009/12/31']
    if len(tsd3)>100:
        lntsd=np.log(tsd2.iloc[:,5])#Take the natural logarithm of the data
        m.append((lntsd.diff().dropna().mean()+1)**250-1)
        v.append(lntsd.diff().dropna().std()*np.sqrt(250))
        print('{0: 03d}'.format(j+1),'{0:7s}'.format(PORT[i]),'average{0:5.2f}'.format(m[j]),
          'Volatility{0:5.2f}'.format(v[j]),'m/v {0:5.2f}'.format(m[j]/v[j]),
          'The number of data{0:10d}'.format(len(tsd)))
        j+=1
v_m=pd.DataFrame({'v':v,'m':m})
plt.scatter(v_m.v,v_m.m,color="g")
plt.ylabel('return')
plt.xlabel('volatility')

01 GLD Average 0.04 Volatility 0.16 m / v 0.28 Number of data 3991 02 VNQ Average 0.10 Volatility 0.21 m / v 0.45 Number of data 4027 03 IYR Average 0.09 Volatility 0.21 m / v 0.44 Number of data 5101 04 AGG Average 0.04 Volatility 0.04 m / v 1.00 Number of data 4279 05 BND Average 0.04 Volatility 0.04 m / v 0.95 Number of data 3392 06 LQD Average 0.06 Volatility 0.07 m / v 0.85 Number of data 4573 07 JNK Average 0.05 Volatility 0.09 m / v 0.61 Number of data 3226

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/267055/74e510f5-49da-fc6d-2838-0d6ee52a83ee.png)


It seems that the overall tendency can be grasped by the straight line rising to the right. If you take a big risk, you will get a big return.

```python
 m2 = [] # Save the annualized average data of each stock price
 v2 = [] # Save the annualized standard deviation of each stock price
PORT=ETF2
j=0
for i in range(len(PORT)):
 tsd = web.DataReader (PORT [i], "yahoo", '1980/1/1') # Download data
    tsd2=tsd.loc['1980/1/1':'2009/12/31']
    if len(tsd2)>100:
 lntsd = np.log (tsd2.iloc [:, 5]) # Take the natural logarithm of the data
        m2.append((lntsd.diff().dropna().mean()+1)**250-1)
        v2.append(lntsd.diff().dropna().std()*np.sqrt(250))
 print ('{0: 03d}'.format (j + 1),'{0: 7s}'.format (PORT [i]),'Average {0: 5.2f}'.format (m2 [j]) ,
 'Volatility {0: 5.2f}'.format (v2 [j]),' m / v {0: 5.2f}'.format (m2 [j] / v2 [j]),
 'Number of data {0: 10d}'. format (len (tsd2)))
        j+=1
v_m2=pd.DataFrame({'v2':v2,'m2':m2})
plt.scatter(v_m2.v2,v_m2.m2,color="g")
plt.ylabel('return')
plt.xlabel('volatility')

 01 GLD Average 0.19 Volatility 0.22 m / v 0.84 Number of data 1289
 02 VNQ Average 0.04 Volatility 0.45 m / v 0.08 Number of data 1325
 03 IYR Average 0.08 Volatility 0.34 m / v 0.24 Number of data 2399
 04 AGG Average 0.04 Volatility 0.06 m / v 0.72 Number of data 1577
 05 BND Average 0.06 Volatility 0.07 m / v 0.84 Number of data 690
 06 LQD Average 0.06 Volatility 0.10 m / v 0.62 Number of data 1871
 07 JNK Average 0.02 Volatility 0.25 m / v 0.08 Number of data 524

It has the same tendency as after the Lehman shock.

Next, let's compare the two periods.

plt.scatter(v_m.m/v_m.v,v_m2.m2/v_m2.v2)
plt.ylabel('2009-now')
plt.xlabel('1980-2009')

In many cases, the tendency for higher risk to be higher returns seems to be the same before and after the Lehman shock. This shows that bond management seems to be easier to predict price movement patterns than stock management.

Then the note at the beginning

-Basically long-term investment -Invest in a long-term rising market -Take risks to the extent that you do not get excited in the down market -Don't sell until you make a profit -If you think the stock market is risky to you, invest in the bond market, and if you still think it is risky, bank deposits

It seems that we can reach the conclusion. However, this conclusion is different for each person, so get your own.