An index showing the correlation between the two ranking data. For details, refer to the URL below. Wiki: [Spearman's Rank Correlation Coefficient](https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%94%E3%82%A2%E3%83%9E%E3%83 % B3% E3% 81% AE% E9% A0% 86% E4% BD% 8D% E7% 9B% B8% E9% 96% A2% E4% BF% 82% E6% 95% B0) Toki no Mori Wiki: [Spearman Rank Correlation Coefficient](http://ibisforest.org/index.php?Spearman%E9%A0%86%E4%BD%8D%E7%9B%B8%E9%96%A2 % E4% BF% 82% E6% 95% B0)
There are several formulas, but this time we will use this formula.
Also, I think you should use the one at the following URL to match the answers of the created program. The rank correlation coefficient is famous, so if you look it up, you can find other samples. Introduction to Spearman's Rank Correlation Coefficient Statistics
spearman.py
def spearman(list_a, list_b):
N = len(list_a)
return 1 - ((6 * sum(map(lambda a, b: (a - b) ** 2, \
list_a, list_b) / float(N ** 3 - N) )
You can easily calculate like this.
The argument list
creates a sequence like [1,2,3 ...].
Normally, you can create a List of two sequences and pass it with zip
.
Using numpy eliminates the comprehension part and makes it simpler.
spearman_numpy.py
import numpy
def spearman(array_a, array_b):
N = len(array_a)
return 1 - (6 * sum((array_a - array_b) ** 2)) / float(N**3 - N)
Since there was a mistake, I reflected owdowt's comment. Thank you very much. [Correction date: 19/02/26]
This one is simpler and better.
The argument creates a sequence like numpy.array ([1,2,3 ...])
.
When using a sequence in python, it is better to use numpy.
I haven't done anything about exception handling this time. Also, if the sequence has the same order, the calculation formula will be different, so refer to the URL introduced at the beginning.
Recommended Posts