At first

I want to treat stocks as a learning subject, but I am not confident that I can analyze them with raw data. Therefore, I would like to create some artificial data and run it with python. I would like to proceed with the purpose as learning the python program.

About the data to be handled

We handle trading dates from January 4, 2016 to November 8, 2019. In addition, this data is

period	Trend
2016	Declining trend
2017	Neutral
2018	Increasing tendency
2019	Increasing tendency(strength)

It is a fictitious brand whose closing price changes with.

I want to upload the data (text file) I handled, but I wonder if Qiita can only upload images ...

A 944 line csv file containing the following information.

`SampleStock01.csv`


Fictitious company 01
date,Open price,High price,Low price,closing price
2016/1/4,9,934,10,055,9,933,10,000
2016/1/5,10,062,10,092,9,942,10,015
2016/1/6,9,961,10,041,9,928,10,007
2016/1/7,9,946,10,060,9,889,9,968
2016/1/8,9,812,9,952,9,730,9,932
2016/1/12,9,912,9,966,9,907,9,940
2016/1/13,9,681,9,964,9,607,9,928
2016/1/14,9,748,9,864,9,686,9,858
(Omission)

Advance preparation

It's just studying, so I'll start from a clean environment. The learning environment is

Windows HomeEdition 10
Python 3.7.4

`command prompt`


python -m venv stock
.\stock\Scripts\Activate

After upgrading pip, with matplotlib and pandas

`command prompt`


python -m pip install --upgrade pip
pip install matplotlib
pip install pandas
pip install Seaborn

Check the installed packages

`command prompt`


pip list

Execution result

Package Version --------------- ------- cycler 0.10.0 kiwisolver 1.1.0 matplotlib 3.1.1 numpy 1.17.4 pandas 0.25.3 pip 19.3.1 pyparsing 2.4.5 python-dateutil 2.8.1 pytz 2019.3 scipy 1.3.2 seaborn 0.9.0 setuptools 40.8.0 six 1.13.0

Read file

Failure example 01

First of all, without thinking about anything, try reading with pd.read_csv ().

`fail_case01.py`


import pandas as pd

dframe = pd.read_csv('SampleStock01.csv')

Execution result

As expected, an error is returned.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

The file I / O part is the one that stumbles when dealing with python ... Thinking is interrupted here every time.

Failure example 02

However, failure example 01 is a category of expectation, and here it is just a matter of specifying encoding.

`fail_case02.py`


import pandas as pd

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01.csv', encoding="SJIS")

Execution result

Yes. I knew It also fails here.

pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 3, saw 9

Since the first line of the CSV file is set to the brand name, you have to read from the second line.

Failure example 03

This is just a matter of ignoring the first line and reading from the second line.

`fail_case03.py`


import pandas as pd

#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01.csv', encoding="SJIS", header=1)

print(dframe)

Execution result

Date Open Price High Low Price Close 2016/1/4 9 934 10 55 9 933 10 0 2016/1/5 10 62 10 92 9 942 10 15 2016/1/6 9 961 10 41 9 928 10 7 2016/1/7 9 946 10 60 9 889 9 968 2016/1/8 9 812 9 952 9 730 9 932 ... ... .. ... .. ... 2019/11/1 13 956 15 59 13 940 14 928 2019/11/5 13 893 15 54 13 820 14 968 2019/11/6 14 3 15 155 13 919 15 47 2019/11/7 14 180 15 54 14 57 15 41 2019/11/8 14 76 15 52 13 939 15 41

[942 rows x 5 columns]

I was able to read it into the data frame properly! I was pleased with it.

CSV delimiter "," and digit delimiter "," are mixed </ font> </ strong> and cannot be read correctly in dataframe.

CSV file you wanted to read

SampleStock01.csv

Fictitious company 01 date,Open price,High price,Low price,closing price 2016/1/4,9,934,10,055,9,933,10,000 2016/1/5,10,062,10,092,9,942,10,015 2016/1/6,9,961,10,041,9,928,10,007 2016/1/7,9,946,10,060,9,889,9,968 2016/1/8,9,812,9,952,9,730,9,932 2016/1/12,9,912,9,966,9,907,9,940 2016/1/13,9,681,9,964,9,607,9,928 2016/1/14,9,748,9,864,9,686,9,858 (Omission)

To be honest, I think that there is no choice but to modify the read file for this, so I modified the CSV delimiter from "," to "tab character". However, what should I do if I encounter this kind of event when analyzing business logs? ?? If anyone knows a good way, please let me know.

Anyway, modify the CSV to be read as follows.

SampleStock01_t1.csv

Fictitious company 01 Date Open Price High Low Price Close 2016/1/4 9,934 10,055 9,933 10,000 2016/1/5 10,062 10,092 9,942 10,015 2016/1/6 9,961 10,041 9,928 10,007 2016/1/7 9,946 10,060 9,889 9,968 2016/1/8 9,812 9,952 9,730 9,932 2016/1/12 9,912 9,966 9,907 9,940 2016/1/13 9,681 9,964 9,607 9,928 2016/1/14 9,748 9,864 9,686 9,858 (Omission)

Success story

I tried to be honest for the fourth time by adding a process to specify that the delimiter is a tab character in the code so far.

Success_case.py

import pandas as pd #CSV file(SampleStock01.csv)Specify the character code of import pandas as pd #CSV file(SampleStock01.csv)Specify the character code of dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \ header=1, sep='\t') print(dframe)

Execution result

Date Open Price High Low Price Close 0 2016/1/4 9,934 10,055 9,933 10,000 1 2016/1/5 10,062 10,092 9,942 10,015 2 2016/1/6 9,961 10,041 9,928 10,007 3 2016/1/7 9,946 10,060 9,889 9,968 4 2016/1/8 9,812 9,952 9,730 9,932 .. ... ... ... ... ... 937 2019/11/1 13,956 15,059 13,940 14,928 938 2019/11/5 13,893 15,054 13,820 14,968 939 2019/11/6 14,003 15,155 13,919 15,047 940 2019/11/7 14,180 15,054 14,057 15,041 941 2019/11/8 14,076 15,052 13,939 15,041

[942 rows x 5 columns]

Although there are many concerns such as index specification and column type, finally read_csv has been completed. For reference books, it's a few lines of work, but ...

Finally

File I / O is the biggest challenge when dealing with dataframes, but are other people going easily? Not limited to dataframe, python as a whole, no, file I / O has been a demon for me since the C language era.

Once it's loaded, it's easy because it's a program problem. (

Recommended Posts
[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)

[Stock price analysis] Learning pandas with fictitious data (002: Log output)

[Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)

[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)

Data analysis environment construction with Python (IPython notebook + Pandas)

Download Japanese stock price data with python

Data analysis starting with python (data preprocessing-machine learning)

Get stock price data with Quandl API [Python]

Automatic acquisition of stock price data with docker-compose

Stock price forecast using deep learning [Data acquisition]

Data analysis with python 2

Be careful when reading data with pandas (specify dtype)

Reading Note: An Introduction to Data Analysis with Python

Data is missing when getting stock price data with Pandas-datareader

Data visualization with pandas

Data manipulation with Pandas!

Shuffle data with pandas

"Getting stock price time series data from k-db.com with Python" Program environment creation memo

Data analysis with Python

Build a data analysis environment with Kedro + MLflow + Github Actions

Get Japanese stock price information from yahoo finance with pandas

Stock price forecast with tensorflow

Python data analysis learning notes

Get stock price with Python

Stock price data acquisition tips

Data analysis using python pandas

Data processing tips with Pandas

[Stock price analysis] Learning pandas on the Nikkei average (005: Grouping by year / month-confirmation of statistical information)

Create a USB boot Ubuntu with a Python environment for data analysis

Versatile data plotting with pandas + matplotlib

Convenient analysis with Pandas + Jupyter notebook

Data analysis starting with python (data visualization 1)

Data science environment construction with Docker

Data analysis starting with python (data visualization 2)

[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)

At first

About the data to be handled

`SampleStock01.csv`

Advance preparation

`command prompt`

`command prompt`

`command prompt`

Execution result

Read file

Failure example 01

`fail_case01.py`

Execution result

Failure example 02

`fail_case02.py`

Execution result

Failure example 03

`fail_case03.py`

Execution result

`SampleStock01.csv`

`SampleStock01_t1.csv`

Success story

`Success_case.py`

Execution result

Finally