I made a log output function, so I would like to continue studying pandas.
Until the last time
import pandas as pd
I was importing only, but I have imported Series, DataFrame and numpy. By the way, as a practice of pandas, I will add a process to output the index and specific entanglement.
Success_case01.py
import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')
#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)
logger.addHandler(handler)
#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
header=1, sep='\t')
#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
The following contents are recorded as index values.
info_log.log
2019-11-11 20:04:13,275:<module>:INFO:33:
Index(['date', 'Open price', 'High price', 'Low price', 'closing price'], dtype='object')
I was able to extract only the opening and closing price data without any problems.
info_log.log
2019-11-11 20:04:13,290:<module>:INFO:35:
Open price Close price
0 9,934 10,000
1 10,062 10,015
2 9,961 10,007
3 9,946 9,968
4 9,812 9,932
.. ... ...
937 13,956 14,928
938 13,893 14,968
939 14,003 15,047
940 14,180 15,041
941 14,076 15,041
[942 rows x 2 columns]
The points are only the following two lines.
Point_Code.py
#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')
Just in case, the whole code including debug information is as follows. (A blog post is useful because you can't make such redundant descriptions in a paper reference book.)
Success_case02.py
import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')
#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)
logger.addHandler(handler)
#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
header=1, sep='\t')
#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')
#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)
The index information is displayed as follows.
info_log.log
2019-11-11 20:31:44,825:<module>:INFO:44:
DatetimeIndex(['2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07',
'2016-01-08', '2016-01-12', '2016-01-13', '2016-01-14',
'2016-01-15', '2016-01-18',
...
'2019-10-25', '2019-10-28', '2019-10-29', '2019-10-30',
'2019-10-31', '2019-11-01', '2019-11-05', '2019-11-06',
'2019-11-07', '2019-11-08'],
dtype='datetime64[ns]', name='date', length=942, freq=None)
In addition, since other opening price, high price, low price, closing price are stored in Object type as shown below, number calculation is not possible, so we will convert it to float32 type in the next section.
info_log.log
2019-11-11 20:38:35,216:<module>:INFO:44:
Open price object
Overpriced object
Low price object
Closing price object
dtype: object
I forgot to mention it, but the index before ** pd.to_datetime (dframe ['date']) ** is displayed as follows.
info_log.log
2019-11-11 20:36:22,326:<module>:INFO:37:
RangeIndex(start=0, stop=942, step=1)
Point_Code.py
dframe = dframe.apply(lambda x: x.str.replace(',','')).astype(np.float64)
I won't post the full code this time because it's too verbose.
Since it has become a numerical type as shown below, it will be possible to calculate and graph it in the future.
info_log.log
2019-11-11 20:53:35,326:<module>:INFO:46:
Open price float32
High float32
Low price float32
Closing price float32
dtype: object
If here
fail_Code01.py
dframe = dframe.astype(np.float64)
If you try to do it easily, a Value Error will be spit out at the opening price at the beginning.
ValueError: could not convert string to float: '9,934'
This time there was no problem because it was the data I prepared myself, but when analyzing unknown data, there is a possibility that a character string such as "abcde" may be included instead of a number with a comma, so do error handling It's a point that is easy to get hooked on if you don't.
In such a place, I would like to output a cool log with ** logger.exception () ** etc., but as of November 11, 2019, I do not have that skill, so I will leave it as a future task.
Package installation for making candlestick charts
command prompt
pip install https://github.com/matplotlib/mpl_finance/archive/master.zip
Point_Code.py
(Omitted)
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc
(Omitted)
#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)
#Creating a campus
fig = plt.figure()
#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))
#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')
#Save the image
plt.savefig('Candle_Chart.png')
I made a candlestick chart, but the data I prepared was so terrible that I didn't feel like going forward. .. ..
From the next article, I would like to organize the code, utilize the functions of panda, and improve the graph while preparing a little better data.
Study_Code.py
import pandas as pd
import logging
#[Stock price analysis] Learning pandas with fictitious data(003)Add more
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_finance import candlestick_ohlc
#Specifying the log format
# %(asctime)s :A human-readable representation of the time the LogRecord was generated.
# %(funcName)s :The name of the function that contains the logging call
# %(levelname)s :Character logging level for messages
# %(lineno)d :Source line number where the logging call was issued
# %(message)s : msg %Log message requested as args
fomatter = logging.Formatter('%(asctime)s:%(funcName)s:%(levelname)s:%(lineno)d:\n%(message)s')
#Logger settings(INFO log level)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
#Handler settings(Change output file/Log level settings/Log format settings)
handler = logging.FileHandler('info_log.log')
handler.setLevel(logging.INFO)
handler.setFormatter(fomatter)
logger.addHandler(handler)
#CSV file(SampleStock01.csv)Specify the character code of
dframe = pd.read_csv('SampleStock01_t1.csv', encoding='SJIS', \
header=1, sep='\t')
#Convert to date type
dframe['date'] = pd.to_datetime(dframe['date'])
#Specify date column as index
dframe = dframe.set_index('date')
#Convert open to close prices to numbers
dframe = dframe.apply(lambda x: x.str.replace(',','')).astype(np.float32)
#Change to use logger
logger.info(dframe)
#Output index
logger.info(dframe.columns)
#Output only open and close prices
logger.info(dframe[['Open price','closing price']])
#Checking the index
logger.info(dframe.index)
#Type confirmation
logger.info(dframe.dtypes)
#Creating data for plotting
ohlc = zip(mdates.date2num(dframe.index), dframe['Open price'], dframe['closing price'], dframe['High price'], dframe['closing price'])
logger.info(ohlc)
#Creating a campus
fig = plt.figure()
#Format the X-axis
ax = plt.subplot()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))
#Draw a candlestick chart
candlestick_ohlc(ax, ohlc, width=0.7, colorup='g', colordown='r')
#Save the image
plt.savefig('Candle_Chart.png')
Again, the data I prepared was a bit too terrible, so from the next time I would like to prepare other data and write an article.
Recommended Posts