Purpose

--Depending on the distance from the straight line or curve obtained by regression analysis Change the color and density of the points to be plotted. --Visualize the distance distribution in other figures.

code

`example.py`


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()  #It seems that the figure can be drawn beautifully, I always use it

amp = 100    #amplitude
frequency = 0.02     #frequency
offset = 1000       #offset

t = np.linspace(0,100,1000)
y_ = amp*np.sin(2*np.pi*frequency*t)+offset  #Theoretical value=Regression curve sin
y = np.random.poisson(y_)    #Observed values=With error sin

###Main###
dis = abs( y_ - y )/ y_.max()   #The difference between the theoretical value and the observed value is 0.0~1.Scaling to a value of 0
color_list=[ [1-9*i,0,i*9,i*5] for i in dis ]   #Specify color with RGB value

f = plt.figure(figsize = (12,6))
f.add_subplot(121)
plt.scatter(t,y_)
plt.xlabel('t')
plt.ylabel('y')

f.add_subplot(122)
plt.scatter(t,y,color = color_list)    
plt.xlabel('t')
plt.ylabel('y')

result

Commentary

First, regarding the data, by substituting the value of the sine wave at each time into np.random.poisson (y_), ** y ** is a sine wave that follows a Poisson distribution with an average ** y_ ** error. Therefore, a large value has a large error, and a small value has a small error. 　 As you can see, the main gradation this time uses a for statement to specify the color. In matplotlib, you can specify the color with [R, G, B, darkness] = [r, g, b, c]. However, r, g, b, c take values from 0.0 to 1.0. Using this mechanism, we performed four arithmetic operations so that the color would be lighter at short distances and darker at long distances. In this example, it is light red at short distances and dark blue at long distances.

important point

--The RGB value must be 0.0 to 1.0, so the coefficient must be adjusted to this range. --Of course, there are many points near the regression curve (regression straight line), so unless you make it considerably thin, the points will overlap and you will not be able to distinguish whether the density is high or the distance is short.

application

With this color specification, the distance from the regression curve can be visualized in other figures. Let's consider a case where the previous code is changed a little and the error increases as time goes by. (Code will be described later) Considering the figures of horizontal axis ** t **, vertical axis ** y **, horizontal axis ** y_ **, vertical axis ** y **,

From the figure on the left, the increase in distance (error) can be confirmed at a glance. From the figure on the right, it can be seen that the distance between ** y ** and ** y _ ** correlates with ** y _ **. In this way, it can be expected that the understanding of the data will be further deepened by changing the plotting axis. The more parameters that determine the observed value ** y **, the more various perspectives can be enjoyed.

Modified code

`example2.py`



import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

amp=100   #amplitude
frequency=0.02 #frequency
offset=1000     #bias
t=np.linspace(0,100,1000)
y_=amp*np.sin(2*np.pi*frequency*t)+offset

y=np.random.poisson(y_+2.6*t)   ###Change line###

dis=abs(y_-y)/y_.max()
color_list=[[1-3*i,0,3*i,i] for i in dis]    ###Change line###

f=plt.figure(figsize=(12,6))
f.add_subplot(121)
plt.scatter(t,y,color=color_list)    
plt.xlabel('t')
plt.ylabel('y')

f.add_subplot(122)
plt.scatter(y_,y,color=color_list)
plt.xlabel('y_')
plt.ylabel('y')

Additions

Comparison of time and theoretical value ** y_ ** The color depth of the wrapping is probably due to the high density of dots.

Coloring points according to the distance from the regression curve