Format and display time series data with different scales and units with Python or Matplotlib

You don't have to worry about graphing time series data from one source. However, I didn't know how to display time series data from different sources, so I tried it. In conclusion, I got the result I wanted, but it's not good enough.

The result is unpleasant, so please let me know if there is a smarter way.

What you want to do and challenges

As an example

It's like that. It is easy to display each data, but it is troublesome if the acquisition period and unit (number of cases,%, etc.) are different.

Please see here for how to get data from DB or CSV in the first place.

Matplotlib basics

Before I talk about the details, I'm not used to Matploglib in the first place, so I'll take a quick look at it. The simplest code looks like this: It is simpler without using subplot, but I dare to write it in subplot (description for controlling multiple figures) for compatibility of the following description.

#coding:utf-8
import matplotlib.pyplot as plt

#A blank canvas? Generate a
fig = plt.figure()
#Prepare an area to draw the first figure in 1 row and 1 column
ax = fig.add_subplot(1,1,1)

#Prepare data The number of x and y must match
x = [0,1,2,3,4,5]
y = [54,35,32,44,74,45]

#Set data (plot)
ax.plot(x,y)

#Show figure
plt.show()

Well, it looks normal.

plt

See the code supplements and explanations at the bottom.

Confirmation of issues

Prepare two data with two different properties. Data and dates are automatically generated by powerful Python functions.

Data 1

Daily data from June 2, 2016 to June 7, 2016. The number of data is 6.

#coding:utf-8

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1,1,1)

#0-Create a 6-element list by generating random numbers between 100
y1 = np.random.randint(0,100,6)
#2016-06-02 00:00:Generate daily datetime from 00(Generate 6)
x1 = pd.date_range('2016-06-02 00:00:00',periods=6,freq='d')

ax.plot(x1,y1)

plt.show()

Like this.

plt

Data 2

Hourly data from June 3, 2016 to June 9, 2016. The number of data is 150.

#coding:utf-8

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1,1,1)

#5-Generate a random number between 40 to generate a list of 150 elements
y2 = np.random.randint(5,40,150)
#2016-06-03 12:00:Generate datetime every hour from 00(150 pieces generated)
x2= pd.date_range('2016-06-03 12:00:00',periods=150,freq='H')

ax.plot(x2,y2)

plt.show()

Since it is an hourly unit, it feels more intense than data 1.

plt

Although the above two data have a period of time, they have different acquisition periods and properties. At a minimum, I want to display the time axis (x axis) together.

Display for the time being

I will display it for the time being.

#coding:utf-8

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

fig = plt.figure()
#Secure a place to draw a 2-by-1 diagram
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)

#data1
y1 = np.random.randint(0,100,6)
x1 = pd.date_range('2016-06-02 00:00:00',periods=6,freq='d')

#data2
y2 = np.random.randint(5,40,150)
x2 = pd.date_range('2016-06-03 12:00:00',periods=150,freq='H')

#plot
ax1.plot(x1,y1)
ax2.plot(x2,y2)

plt.show()

It looks like that, but the time series does not match, and the graph is virtually meaningless.

plt

How to respond

Reluctantly

I tried various things. For me, a Python beginner, the following are the limits for the time being. Dummy data (x0, y0 in this case), which is a common unit for both data, is generated and used.

Here, I tried using the data from 6/1 to 6/10.

#coding:utf-8

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

fig = plt.figure()
#Secure a place to draw a 2-by-1 diagram
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)

#data0 Generate dummy data as a reference
#A list of 10 y values containing 10
#The value of x is 6/1 ~ 6/Datetime up to 10
y0 = [0]*10
x0 = pd.date_range('2016-06-01 00:00:00',periods=10,freq='d')

#data1
y1 = np.random.randint(0,100,6)
x1 = pd.date_range('2016-06-02 00:00:00',periods=6,freq='d')

#data2
y2 = np.random.randint(5,40,150)
x2 = pd.date_range('2016-06-03 12:00:00',periods=150,freq='H')

#plot
#ax1
ax1.plot(x0,y0)
ax1.plot(x1,y1)
#ax2
ax2.plot(x0,y0)
ax2.plot(x2,y2)

plt.show()

Apparently, the time axis is correct, but it's hard to see because of the captions.

plt

Adjust the appearance

#coding:utf-8

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt

fig = plt.figure()
#Secure a place to draw a 2-by-1 diagram
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)

#data0
y0 = [0]*10
x0 = pd.date_range('2016-06-01 00:00:00',periods=10,freq='d')

#data1
y1 = np.random.randint(0,100,6)
x1 = pd.date_range('2016-06-02 00:00:00',periods=6,freq='d')

#data2
y2 = np.random.randint(5,40,150)
x2 = pd.date_range('2016-06-03 12:00:00',periods=150,freq='H')

#plot
#ax1
ax1.plot(x0,y0)
ax1.plot(x1,y1,'r')
#ax2
ax2.plot(x0,y0)
ax2.plot(x2,y2,'b')

#Plastic surgery
#ax1
ax1.set_xticks(x0)
ax1.set_xticklabels(x0,rotation=90,size="small")
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax1.grid()
#ax2
ax2.set_xticks(x0)
ax2.set_xticklabels(x0,rotation=90,size="small")
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax2.grid()

#Prevents vertical captions from being covered
plt.subplots_adjust(hspace=0.7,bottom=0.2)

plt.show()

The caption was made vertical, and only the date was displayed. I also changed the color of the graph. Personally, this is enough.

plt

Merge into one graph

In many cases, it is not necessary to divide the graph into two graphs, so try displaying them in one graph.

#coding:utf-8

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt

fig = plt.figure()
#1 row 1 column diagram
ax1 = fig.add_subplot(1,1,1)
#Add layer (like?)
ax2 = ax1.twinx()

#data0
y0 = [0]*10
x0 = pd.date_range('2016-06-01 00:00:00',periods=10,freq='d')

#data1
y1 = np.random.randint(0,100,6)
x1 = pd.date_range('2016-06-02 00:00:00',periods=6,freq='d')

#data2
y2 = np.random.randint(5,40,150)
x2 = pd.date_range('2016-06-03 12:00:00',periods=150,freq='H')

#plot
#ax1
ax1.plot(x0,y0)
ax1.plot(x1,y1,'r')

#ax2
ax2.plot(x0,y0)
ax2.plot(x2,y2,'b')

#Plastic surgery
#ax1
ax1.set_xticks(x0)
ax1.set_xticklabels(x0,rotation=90,size="small")
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax1.grid()
#Axis format adjustment
ax1.set_ylabel('pv', color='r')
for tl in ax1.get_yticklabels():
    tl.set_color('r')
#ax2
ax2.set_xticks(x0)
ax2.set_xticklabels(x0,rotation=90,size="small")
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax2.grid()
#Axis format adjustment
ax2.set_ylabel('cpu', color='b')
for tl in ax2.get_yticklabels():
    tl.set_color('b')

#Prevents vertical captions from being covered
plt.subplots_adjust(hspace=0.7,bottom=0.2)

plt.show()

Well, it looks like this. I changed the color of the axis to make it easier to understand which data it is (although it is difficult to understand).

plt

Supplement / Explanation

String to datetime conversion

In the above example, since x data was generated by date_range () of pandas, it is datetime type from the beginning, but when it is obtained from CSV or DB, it is a character string. It seems that it can handle strings as it is, but if necessary, convert it to datetime.

The conversion from the string list to the datetime list is as follows.

y1 = [0.43,0.26,0.33]
d1 = ['2016-06-21 12:00:00','2016-06-23 09:00:00','2016-06-26 18:00:00']
x1 = [dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S') for d in d1]

Editing the axis (X axis)

I will explain how to edit the x-axis.

Decide where to put the caption

It is a little difficult to understand if it is a date, but for example, if the horizontal axis is 0 to 1000 and you want to display the unit only in 3 places of 300, 600 and 900, use set_xticks ([300,600,900]).

Here, since I want to insert captions daily from 6/1 to 6/10, x0 generated by the dummy is substituted as it is.

ax1.set_xticks(x0)

Decide how to display captions

After deciding where to put the caption, the next step is to decide the display content, display format, etc. For example, in the above, the position is set_xticks ([300,600,900]), but if you want to set it to small, medium, or large on the display, set_xticklabels (['small','medium','large']). You can specify the slope of the character with rotation. If it is 90, it will be vertical. The size remains the same.

Here, we want to display the dates from 6/1 to 6/10 as they are, so we substitute x0 as they are.

ax1.set_xticklabels(x0,rotation=90,size="small")

Further adjust the display format

When x0 is displayed normally, all the year, month, day, hour, minute, and second are displayed and it is long, so only the year, month, and day are displayed.

ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

Color

Line color

If you want to color the lines of the data, do as follows. In this example it will be red.

ax1.plot(x1,y1,'r')

Axis color

If you want to specify the axis and caption and add color, do as follows. In this example it will be blue.

ax2.set_ylabel('cpu', color='b')
for tl in ax2.get_yticklabels():
    tl.set_color('b')

Margin adjustment

hspace adjusts the vertical spacing between graphs. It seems to be a unit when the height of the fluff is 1.0. bottom is the bottom margin.

plt.subplots_adjust(hspace=0.7,bottom=0.2)

Recommended Posts

Format and display time series data with different scales and units with Python or Matplotlib
Graph time series data in Python using pandas and matplotlib
Plot CSV of time series data with unixtime value in Python (matplotlib)
Reading, summarizing, visualizing, and exporting time series data to an Excel file with Python
[Python] Plot time series data
[Python] Read the csv file and display the figure with matplotlib
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
[Python] font family and font with matplotlib
Python: Time Series Analysis: Preprocessing Time Series Data
About time series data and overfitting
When plotting time series data and getting a matplotlib Overflow Error
Forecasting time series data with Simplex Projection
Data pipeline construction with Python and Luigi
Calculate and display standard weight with python
Predict time series data with neural network
Reading OpenFOAM time series data and sets data
Plot multiple maps and data at the same time with Python's matplotlib
Execute raw SQL using python data source with redash and display the result
Get time series data from k-db.com in Python
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Receive and display HTML form data in Python
Interactively display algebraic curves with Python and Jupyter
[Python] Conversion memo between time data and numerical data
Implement "Data Visualization Design # 3" with pandas and matplotlib
Smoothing of time series and waveform data 3 methods (smoothing)
View details of time series data with Remotte
[Python] Convert time display (str type) using "" "and"'" to seconds (float type) with datetime and timedelta
I created a stacked bar graph with matplotlib in Python and added a data label
Implementation of clustering k-shape method for time series data [Unsupervised learning with python Chapter 13]
"Getting stock price time series data from k-db.com with Python" Program environment creation memo
Extract "current date only" and "current date and time" with python datetime.
Read and analyze arff format dataset with python scipy.io
Read json file with Python, format it, and output json
Display and shoot webcam video with Python Kivy [GUI]
[Time series with plotly] Dynamic visualization with plotly [python, stock price]
Investigate Java and python data exchange with Apache Arrow
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①
Data analysis with python 2
Python: Time Series Analysis
Python time series question
Heatmap with Python + matplotlib
Data analysis with Python
Display TOPIX time series
Time series plot / Matplotlib
I made a package to filter time series with python
Perform isocurrent analysis of open channels with Python and matplotlib
Notes on importing data from MySQL or CSV with Python
Let's create a PRML diagram with Python, Numpy and matplotlib.
How to generate exponential pulse time series data in python
Get rid of dirty data with Python and regular expressions
Reformat the timeline of the pandas time series plot with matplotlib
Solve the spiral book (algorithm and data structure) with python!
[Python] Display the elapsed time in hours, minutes, and seconds (00:00:00)
This time I learned python III and IV with Prorate
Get additional data to LDAP with python (Writer and Reader)