What I felt when I started posting to Qiita recently is that I have been able to stock it for about 3 days to 1 week after posting. So, if you can grasp the popular tags in a span of about 3 days or 1 week, can you write an article that can be read by more people? I thought about it.
item | Contents |
---|---|
Server OS | CentOS6.6 |
Elasticsearch | 5.3.1 |
Kibana | 4.3.1 |
Qiita API | v2 |
Terminal that calls the API | Mac |
Mac OS | OS X Yosemite |
Python | 2.7.10 |
I used the Qiita API from my Mac to get the post data and sent it to Elasticsearch running on the server in my lab. After sending to Elasticsearch http://xxx.xxx.xxx.xxx:5601 (* xxx.xxx.xxx.xxx is the laboratory server address *) I accessed and created graphs etc. using Kibana.
Details can be found in the Qiita API v2 documentation. I will list the items that I referred to from the document.
https://qiita.com/api/v2
As stated in the Usage Restrictions </ b> in the documentation, you can query about 60 times an hour without authentication. It seems that if you authenticate, it will be 1000 times, but because I was not sure, I did it without authentication this time.
$ curl -XGET 'https://qiita.com/api/v2/items?page=3&per_page=20'
If you do, I think that the result will be returned for the time being. The meaning of this command is to get 20 pieces of data on the 3rd page. In other words, note that the number of data that can be obtained is 20, not 3 * 20.
You can decide page and per_page yourself, but there are restrictions. It is summarized in the table below.
parameter | minimum value | Maximum value |
---|---|---|
page | 1 | 100 |
per_page | 20 | 100 |
If you get 100 post data from each page, you can get up to 10000 post data.
forward_json.py
# coding: utf-8
import json
import requests
from elasticsearch import Elasticsearch
#Address of the server on which Elasticsearch is installed
server_address = "xxx.xxx.xxx.xxx"
#If installed and standard settings, the port is 9200
port = str(9200)
#Create an instance of Elasticsearch
es = Elasticsearch("%s:%s" % (server_address, port))
#end point
endpoint = 'https://qiita.com/api/v2/items'
for p in range(1, 11): #Perform the following processing from page 1 to page 10 in the same way.
payload = {'page': p, 'per_page': '100'} #Get 100 data per page
r = requests.get(endpoint, params=payload).json() #Receive the result in json format
'''
#For your reference
print type(r)
# => <type 'list'>
print r[0].keys()
# => [u'body', u'group', u'rendered_body', u'url', u'created_at', u'tags', u'updated_at', u'private', u'coediting', u'user', u'title', u'id']
'''
for it in r: #Loop through the list of results
#Insert all the data! !!
#This time I tried to name index qiita
es.index(index='qiita', doc_type='qiita', id=it['id'], body=it)
In server_address, write the address of the server where Elasticsearch and Kibana are installed. When you run this code, Elasticsearch should store a total of 1000 post data.
Go to http //xxx.xxx.xxx.xxx:5601. I fetched 1000 post data from Qiita API and stored it in Elasticsearch. This is a screenshot with numbers and graph descriptions added in red. For the time being, the user name was hidden.
There are times when the number of posts is extremely high, so click on it.
Click the green circle in the image above to change the page.
It seems that the number of posts during that time period has increased due to the angry posts by enthusiastic users.
Ranking | Tag name | Number of posts |
---|---|---|
1 | python | 62 |
2 | ruby | 50 |
3 | aoj | 49 |
4 | javascript | 49 |
5 | c | 45 |
6 | ios | 41 |
7 | swift | 38 |
8 | php | 38 |
9 | java | 33 |
10 | linux | 29 |
The result was! After all there are many programming tags! Actually, I wanted to know the stock number ranking of tags, but I stopped because it was difficult to obtain the stock number from the posted data without authentication. I would like to challenge when I have time. This time, I found that there are many posts with python tags, so I would like to continue posting so that I can add python tags.
Recommended Posts