Always stingray! I've written the code in and output the mean and covariance many times, but make a note of it. The contents are really simple. There is no twist. I'm sorry.
Situation: I have pandas data and want to find the mean and covariance for a particular data X, Y, Z
The point is --Use `` `.loc``` to extract by specifying the data name in DataFrame. --DataFrame has mean (), cov (), corr (), and the output is DataFrame, so refer to ndarray in values. --To register as a list in the dictionary, you can use tolist () of ndarray.
from pandas import DataFrame
from numpy import random
import json
df = DataFrame(random.randint(0,100,size=(252, 4)), columns=list('XYZW'))
output_data = dict()
# 1. extract XYZ
df_xyz = df.loc[:,list("XYZ")]
# 2-1 mean vector
u = df_xyz.mean()
output_data["mean"] = u.values.tolist()
# 2-2 covariance
s = df_xyz.cov()
output_data["covariance"] = s.values.tolist()
# 3
with open("out.json", 'w') as f:
json.dump(output_data, f, indent=2)
The output json file is
{
"mean": [
48.34126984126984,
50.52777777777778,
51.492063492063494
],
"covariance": [
[
877.6360589388478,
-44.88202744577245,
-71.94548788971099
],
[
-44.88202744577245,
876.4733289065962,
-32.312527667109336
],
[
-71.94548788971099,
-32.312527667109336,
784.7768291911716
]
]
}
is.
I did some research until I got to this implementation. (Sweat) The Covariance of the DataFrame can be found in the API document (here).
(2020/05/11)
--When you want to write NaN (not a number) processing, you can check the if statement like this.
from numpy import isnan
if isnan(x).any():
x = zeros(3)
if isnan(S).any():
S = zeros( (3,3) )