Last time, I made not too useful classifier, so I will make something that can be used a little more.
On facebook, click the "▼" mark in the upper right (as of July 2014), When you display the "Settings" item, there is a link "Download Facebook data" at the bottom of "General account settings". If you click this and perform various authentications, you can download the data you have posted on facebook in htm format. I tried to download this with a light feeling, but honestly there was too much critical information and I pulled it a little. I feel that if we make a recommender using such data, we will be able to create something with high accuracy.
So, in this story, there is an item called "security" in the data downloaded above, Since you can see who logged in to this account from where on facebook, etc. This is used to detect unauthorized access.
I still use Jubatus. There is an API called jubaanomaly for detecting outliers, so use this.
The Jubatus server configuration file is here. https://github.com/chase0213/anomal_facebook_activity/blob/master/lof.json
Since it is basically a sample as it is, there is no particular explanation. It is also a good point of Jubatus that you can make something that can be used to some extent even if you use the sample as it is.
This time it's "Jubatus de ~ (1)", so when I publish (2), I think I've tuned around here properly.
The data downloaded from facebook is marked up in htm format, so it will be transformed accordingly. This time, I created a python script and transformed it (python 2.6). https://github.com/chase0213/anomal_facebook_activity/blob/master/data/trim.py
(You can write something like string_rules, but I needed to format it a little more)
When the data is ready, pass it to the Jubatus server for the anomaly calculation.
client = jubatus.Anomaly(HOST,PORT,NAME)
Start the Jubatus server as
ret = client.add(datum)
Add the data as. datum is data in jubatus.common.Datum format. Then ret will return the id and anomaly of that data. At first, it is better to look at the return value of this client.add (Datum) and see that the data is properly distributed around 1.0. If any of them are clearly far from 1.0, take a look at the data. There is a possibility of unauthorized access.
So, once you have some data stored on the Jubatus server, I will give you data that is completely different from usual.
anomal_datum = Datum({
"activity": "DELETE",
"time": "July 15, 2014 17:59 UTC+12",
"ip_address": "127.0.0.1",
"brawser": "IE6",
"cookie": "???"
})
After defining the data, let the Jubatus server calculate the anomaly. Here, we want to see if it is abnormal without registering the data, so we use client.calc_score instead of client.add.
anomality = client.calc_score(anomal_datum)
calc_score has a float value as a return value, so please like it whether you boil it or bake it.
I wish I could show "ordinary data", but I will omit it because I will increase anomaly access by myself.
In summary, it looks like this. https://github.com/chase0213/anomal_facebook_activity/blob/master/anomaly.py
So, here is the result of trying it out.
$ python anomaly.py
anomality(anomal datum): 2.33819794655
anomality(nomal datum): 0.999999880791
The second line gives the anomalous data defined above, and the third line gives the "ordinary data". For anomalous data, the degree of anomaly is clearly higher. Perhaps it would be better to set a statistical test or threshold to raise an alert.
That's all for trying out Jubatus, which is a little more useful than last time. Next time I would like to tune this a little more seriously.
As an aside, when I do a Google search for "Jubatus Anomaly", I feel that the page for the old API is hit. Please be aware that the API may not work properly depending on the version (this time).
Recommended Posts