It's a problem over a year ago, but I'll try to solve one of the machine learning problems listed in CodeIQ.
"Machine learning basics" Let's solve and understand simple problems! Part 1 http://next.rikunabi.com/tech/docs/ct_s03600.jsp?p=002315
The first question of this is a challenge.
Mr. N, who participated in the PRML reading party held on a pirate ship on the weekend, was fascinated by the gold and silver treasures piled up on the ship. When I opened one of the treasure chests nearby, I found a number of glittering coins.
When you pick it up, it has a lot of weight. It must be a gold coin.
I was told that I could take as many as I wanted, so I decided to pack some in my bag on my way home from the study session.
After returning home, Mr. N became a little calm and began to wonder, "I was generously handing it out, but is this gold coin genuine?"
There were 20 gold coins in the bag, but when I asked my friend Archimedes to measure them, the volume and weight of the 20 coins were different.
When I searched on the net, I got data on the volume, weight and authenticity of gold coins.
Please refer to this data to identify the authenticity of the gold coins you received.
As mentioned in the original article, it is data that seems to be able to be linearly separated neatly.
Solve with scikit-learn as usual.
import numpy as np
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt
auth = np.genfromtxt('CodeIQ_auth.txt', delimiter=' ')
#Teacher data
train_X = np.array([[x[0], x[1]] for x in auth])
#Teacher data label
labels = [int(x[2]) for x in auth]
#test data
test_X = np.genfromtxt('CodeIQ_mycoins.txt', delimiter=' ')
First, let's plot the data.
fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)
#Extract the correct answer from the teacher data
correct = np.array([[x[0], x[1]] for x in auth if x[2] == 1]).T
#Also extract fake
wrong = np.array([[x[0], x[1]] for x in auth if x[2] == 0]).T
#Plot these into a scatter plot
ax1.scatter(correct[0], correct[1], color='g')
ax1.scatter(wrong[0], wrong[1], color='r')
ax2.scatter(train_X.T[0], train_X.T[1], color='b')
ax2.scatter(test_X.T[0], test_X.T[1], color='r')
plt.legend(loc='best')
plt.show()
plt.savefig("image.png ")
The green in the above figure is the correct answer, and the red is the fake gold coin. It's just like the plot in the original article.
The figure below shows the distribution of the gold coins (red) obtained with respect to the authenticity data (blue).
Since it is a linear separation problem, we use LinearSVC.
clf = LinearSVC(C=1)
#Training
clf.fit(train_X, labels)
#Classification
results = clf.predict(test_X)
for result, feature in zip(results, test_X):
print(result, feature)
1 [ 0.988 17.734]
0 [ 0.769 6.842]
0 [ 0.491 4.334]
1 [ 0.937 16.785]
1 [ 0.844 13.435]
0 [ 0.834 9.518]
1 [ 0.931 16.62 ]
1 [ 0.397 6.705]
1 [ 0.917 16.544]
0 [ 0.45 3.852]
0 [ 0.421 4.612]
1 [ 0.518 9.838]
1 [ 0.874 14.113]
0 [ 0.566 6.529]
0 [ 0.769 8.132]
1 [ 1.043 16.066]
0 [ 0.748 9.021]
0 [ 0.61 6.828]
0 [ 1.079 12.097]
1 [ 0.771 13.505]
0 1 on the left is the answer. So I came up with the same answer as the example.
Recommended Posts