This time, I used D3.js for the vocabulary extracted from Aozora Bunko's work in the article Last time. Let's visualize it.
The completed demo application can be viewed from here. (If it doesn't look good, try reloading your browser)
So far, focusing on the handling of text data, How to use feeds and Extracting topics of interest from a large number of documents by Basian classification Method and Extract characteristic vocabulary from documents using TF-IDF as an index ) I have explained how to do it.
As mentioned at the end of Last time, it is better to use the visualization library than to show the result extracted in this way as data like a character string. It is transmitted well.
In the past, I made Interactive visualization demo using D3.js, but implemented the application in the same way [Heroku](https: //) Let's run it at www.heroku.com/).
First, the vocabulary group is used as a key, and its weight is expressed numerically.
require 'json'
require 'codecs'
def write_json_data(dic):
"""A function that writes the result to JSON"""
arr = [] #Since we will create a two-dimensional vector in JSON, first prepare an array
for k, v in dic.items():
for w, s in v:
#Add to the array while adjusting the score appropriately
arr.append([w, str(round(s * 10000 + 100, 2))])
#When converting a dictionary containing Japanese into JSON in Python
#Ensure like this_If ascii is set to False, it will not be garbled
hash = json.dumps({'values': arr},
sort_keys=True,
ensure_ascii=False,
indent=2,
separators=(',', ': '))
#Clarify the separator and make it a beautiful JSON
#To output the file codecs.with open
f = codecs.open(os.path.join(output_dir, k),
"w", "utf-8")
f.write(hash) #Export
f.close() #Close properly
The generated JSON looks like this when only the beginning is displayed
{
"values": [
[
"Back view",
"199.26"
],
[
"Peculiar",
"299.26"
],
In this way, it becomes a two-dimensional array with an array of keys and values inside the array.
To be honest, I'm not very good at JavaScript, so I'd like to ask for guidance from experts. I will write it with the goal of being able to display it for the time being.
//Add a node
var svg = d3.select("body")
.append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
//JSON data binding
d3.json('../json/novel_name.json', function(error, data) {
data.values.forEach(function(d) {
d.word = String(d[0]); //Key
d.score = d[1]; //value
});
force
.nodes(data.values)
.start();
var node = svg.selectAll("g.node")
.data(data.values)
.enter()
.append("g")
.attr("class", "node")
.call(force.drag);
//Determine the size of the circle based on the value
//Also, the color is changed according to the value.
node.append("circle")
.attr("r", function(d) { return d.score * .1; })
.attr("opacity", .67)
.attr("fill", function(d){
if (d.score <= 300) {
return "#449944"
} else if (d.score > 300 && d.score <= 500) {
return "#33AA33"
} else if (d.score > 500 && d.score <= 750) {
return "#22CC22"
} else if (d.score > 750 && d.score <= 1000) {
return "#11DD11"
}
});
//Add vocabulary and its values
node.append("text")
.text(function(d){ return d.word; })
.attr('fill', '#fff')
.attr('font-size', 24)
.attr('dx', -16)
.attr('dy', -5);
node.append("text")
.text(function(d){ return d.score; })
.attr('fill', '#fff')
.attr('dx', -25)
.attr('dy', 15);
//Directing
force.on("tick", function() {
node
.attr('transform', function(d) {
return 'translate('+ Math.max(20, Math.min(width-20, d.x)) + ','
+ '' + Math.max(20, Math.min(height-20, d.y)) + ')'; });
});
})
Then git push to Heroku and you're done.
heroku create myapp
git push heroku master
heroku open
D3.js demo application http://d3js-data-clips.herokuapp.com/
This time, I visualized the features obtained using D3.js and moved it on Heroku.
At this point, the list of words and numerical values that characterize the document have been obtained, so I think that it can be applied to match with other data sources or to investigate the relationship between multiple documents. I will.
Recommended Posts