Hello, this is Sumiyama water.
I'm thinking of starting with a series, and I'll try to put it together in Qiita while reviewing the clustering I used to do in the past.
First of all, I would like to explain the "k-means clustering", which is the basis of data classification called data mining or clustering, in several steps while gradually implementing it in Java (my review by the name). ..
In this 0th installment, I will talk about the introduction and assumptions for the time being. I wonder if the specific explanation will be from the next time.
In this series, I will not touch on the explanation of the types of "data mining" and "clustering", and the explanation of their uses, but the purpose is to implement the method called k-means clustering to the end.
The idea is that you can get a feel for the atmosphere by moving your hands rather than increasing your knowledge in a classroom manner.
The purpose is to get a feel for the atmosphere by moving your hands, so I will implement it myself without using an existing analysis library.
Also, I knew about this area more than 10 years ago, and I haven't caught up since then, so please be aware that the information is out of date.
I will talk on the premise that I have some knowledge of language. Also, I have mixed motives to want to touch Spring Boot, which I use recently at work, even in private, so I will proceed based on Spring Boot.
Even so, I don't write business logic, so I don't think there will be much talk about Spring Boot. I think it will be written almost in Java itself. Even if you use annotations without a preface, it's a story that you shouldn't forgive.
Assuming that the detailed logic will be turned on from the next time onward, it is very rough.
Data like this
It can be classified like this.
In the figure, I put X and Y appropriately, but I think it would be nice if you could imagine something like "the purchase price and time zone of a certain convenience store user".
Well, in reality, there is no data that is collected so neatly, but even so, if you look at this number of samples with the human eye, even if it looks like a group, it is a technique to let the computer discriminate without any prior information. Is necessary.
If the amount of data increases or the axis is not XY, you need to use the power of a computer.
This time, I briefly talked about the premise and what can be done.
From the next time, I would like to implement it while explaining the parts actually required for the classification logic.
Recommended Posts