In the humanities of statistical application of the first grade of statistical test in 2019, there was a problem about the initial value dependence of the k-means method, but here, in order to confirm that k-means actually depends on the initial value, I wrote python code for a super simple case.
Set the status as follows. Set to classify: A set whose elements are (finite number) real numbers. Number of clusters: 2.
python
print("First enter the number of elements in the categorized set.")
n = int(input())
print("Then enter the elements of the set you want to classify.")
a = [float(input()) for _ in range(n)]
print("Next, enter two initial values.")
b = [float(input()) for _ in range(2)]
A = []
B = []
'''
print(A)
print(B)
'''
for i in range(n):
if abs(b[0] - a[i]) <= abs(b[1] - a[i]):
A.append(a[i])
else:
B.append(a[i])
if len(A) == 0 or len(B) == 0:
print("The first cluster is")
print(A)
print("The second cluster is")
print(B)
else:
c = sum(A)/len(A)
d = sum(B)/len(B)
while c != b[0] or d != b[1]:
b[0] = c
b[1] = d
A = []
B = []
for i in range(n):
if abs(b[0] - a[i]) <= abs(b[1] - a[i]):
A.append(a[i])
else:
B.append(a[i])
c = sum(A)/len(A)
d = sum(B)/len(B)
print("The first cluster is")
print(A)
print("The second cluster is")
print(B)
Below are two examples of running this code with different initial values.
So, it was confirmed that the final cluster will be different if the initial value is actually changed. ∩ (・ ω ・) ∩ You need to be careful when performing cluster analysis using the k-means method.
Recommended Posts