I want to treat stocks as a learning subject, but I am not confident that I can analyze them with raw data. Therefore, I would like to create some artificial data and run it with python. I would like to proceed with the purpose as learning the python program.
We handle trading dates from January 4, 2016 to November 8, 2019. In addition, this data is
period | Trend |
---|---|
2016 | Declining trend |
2017 | Neutral |
2018 | Increasing tendency |
2019 | Increasing tendency(strength) |
It is a fictitious brand whose closing price changes with.
I want to upload the data (text file) I handled, but I wonder if Qiita can only upload images ...
A 944 line csv file containing the following information.
SampleStock01.csv
Fictitious company 01
date,Open price,High price,Low price,closing price
2016/1/4,9,934,10,055,9,933,10,000
2016/1/5,10,062,10,092,9,942,10,015
2016/1/6,9,961,10,041,9,928,10,007
2016/1/7,9,946,10,060,9,889,9,968
2016/1/8,9,812,9,952,9,730,9,932
2016/1/12,9,912,9,966,9,907,9,940
2016/1/13,9,681,9,964,9,607,9,928
2016/1/14,9,748,9,864,9,686,9,858
(Omission)
It's just studying, so I'll start from a clean environment. The learning environment is
command prompt
python -m venv stock
.\stock\Scripts\Activate
After upgrading pip, with matplotlib and pandas
command prompt
python -m pip install --upgrade pip
pip install matplotlib
pip install pandas
pip install Seaborn
Check the installed packages
command prompt
pip list
Package Version --------------- ------- cycler 0.10.0 kiwisolver 1.1.0 matplotlib 3.1.1 numpy 1.17.4 pandas 0.25.3 pip 19.3.1 pyparsing 2.4.5 python-dateutil 2.8.1 pytz 2019.3 scipy 1.3.2 seaborn 0.9.0 setuptools 40.8.0 six 1.13.0
First of all, without thinking about anything, try reading with pd.read_csv ().
fail_case01.py
import pandas as pd
dframe = pd.read_csv('SampleStock01.csv')
As expected, an error is returned.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
The file I / O part is the one that stumbles when dealing with python ... Thinking is interrupted here every time.
However, failure example 01 is a category of expectation, and here it is just a matter of specifying encoding.
fail_case02.py
import pandas as pd
#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01.csv', encoding="SJIS")
Yes. I knew It also fails here.
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 3, saw 9
Since the first line of the CSV file is set to the brand name, you have to read from the second line.
This is just a matter of ignoring the first line and reading from the second line.
fail_case03.py
import pandas as pd
#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01.csv', encoding="SJIS", header=1)
print(dframe)
Date Open Price High Low Price Close 2016/1/4 9 934 10 55 9 933 10 0 2016/1/5 10 62 10 92 9 942 10 15 2016/1/6 9 961 10 41 9 928 10 7 2016/1/7 9 946 10 60 9 889 9 968 2016/1/8 9 812 9 952 9 730 9 932 ... ... .. ... .. ... 2019/11/1 13 956 15 59 13 940 14 928 2019/11/5 13 893 15 54 13 820 14 968 2019/11/6 14 3 15 155 13 919 15 47 2019/11/7 14 180 15 54 14 57 15 41 2019/11/8 14 76 15 52 13 939 15 41
[942 rows x 5 columns]
I was able to read it into the data frame properly! I was pleased with it.
CSV delimiter "," and digit delimiter "," are mixed </ font> </ strong> and cannot be read correctly in dataframe.
CSV file you wanted to read
SampleStock01.csv
Fictitious company 01
date,Open price,High price,Low price,closing price
2016/1/4,9,934,10,055,9,933,10,000
2016/1/5,10,062,10,092,9,942,10,015
2016/1/6,9,961,10,041,9,928,10,007
2016/1/7,9,946,10,060,9,889,9,968
2016/1/8,9,812,9,952,9,730,9,932
2016/1/12,9,912,9,966,9,907,9,940
2016/1/13,9,681,9,964,9,607,9,928
2016/1/14,9,748,9,864,9,686,9,858
(Omission)
To be honest, I think that there is no choice but to modify the read file for this, so I modified the CSV delimiter from "," to "tab character". However, what should I do if I encounter this kind of event when analyzing business logs? ?? If anyone knows a good way, please let me know.
Anyway, modify the CSV to be read as follows.
SampleStock01_t1.csv
Fictitious company 01
Date Open Price High Low Price Close
2016/1/4 9,934 10,055 9,933 10,000
2016/1/5 10,062 10,092 9,942 10,015
2016/1/6 9,961 10,041 9,928 10,007
2016/1/7 9,946 10,060 9,889 9,968
2016/1/8 9,812 9,952 9,730 9,932
2016/1/12 9,912 9,966 9,907 9,940
2016/1/13 9,681 9,964 9,607 9,928
2016/1/14 9,748 9,864 9,686 9,858
(Omission)
I tried to be honest for the fourth time by adding a process to specify that the delimiter is a tab character in the code so far.
Success_case.py
import pandas as pd
#CSV file(SampleStock01.csv)Specify the character code of
import pandas as pd
#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
header=1, sep='\t')
print(dframe)
Date Open Price High Low Price Close 0 2016/1/4 9,934 10,055 9,933 10,000 1 2016/1/5 10,062 10,092 9,942 10,015 2 2016/1/6 9,961 10,041 9,928 10,007 3 2016/1/7 9,946 10,060 9,889 9,968 4 2016/1/8 9,812 9,952 9,730 9,932 .. ... ... ... ... ... 937 2019/11/1 13,956 15,059 13,940 14,928 938 2019/11/5 13,893 15,054 13,820 14,968 939 2019/11/6 14,003 15,155 13,919 15,047 940 2019/11/7 14,180 15,054 14,057 15,041 941 2019/11/8 14,076 15,052 13,939 15,041
[942 rows x 5 columns]
Although there are many concerns such as index specification and column type, finally read_csv has been completed. For reference books, it's a few lines of work, but ...
File I / O is the biggest challenge when dealing with dataframes, but are other people going easily? Not limited to dataframe, python as a whole, no, file I / O has been a demon for me since the C language era.
Once it's loaded, it's easy because it's a program problem. (
Recommended Posts