When there are 0 elements in the array, the value will be Nan if you do it normally.
>>> import numpy as np
>>> a = np.array([0.1,0.3,0,0.05,0.15,0.6,0])
>>> np.log(a)
array([-2.30258509, -1.2039728 , -inf, -2.99573227, -1.89711998,
-0.51082562, -inf])
>>> a*np.log(a)
array([-0.23025851, -0.36119184, nan, -0.14978661, -0.284568 ,
-0.30649537, nan])
>>> -sum(a*np.log(a))
nan
In that case, use masked arrays.
>>> import numpy as np
>>> a = np.array([0.1,0.3,0,0.05,0.15,0.6,0])
>>> np.ma.log(a)
masked_array(data = [-2.3025850929940455 -1.2039728043259361 -- -2.995732273553991
-1.8971199848858813 -0.5108256237659907 --],
mask = [False False True False False False True],
fill_value = 1e+20)
>>> a*np.ma.log(a)
masked_array(data = [-0.23025850929940456 -0.3611918412977808 -- -0.14978661367769955
-0.28456799773288216 -0.30649537425959444 --],
mask = [False False True False False False True],
fill_value = 1e+20)
>>> -(a*np.ma.log(a)).sum()
1.3323003362673613
By the way, when you simply do it with list comprehension.
>>> import numpy as np
>>> a = np.array([0.1,0.3,0,0.05,0.15,0.6,0])
>>> -sum([v*math.log(v) if v > 0 else 0 for v in a])
1.3323003362673613
Which one is better?
Recommended Posts