As a practice of Qiita API v2, I wrote a script to export all the articles I posted. I will summarize it after studying.

What I learned this time

--Get data with Qiita API v2 --Using an access token with Qiita API v2 --Save the GET data to a file --Basic usage of requests library

About Qiita API v2

For details, see Qiita API v2 documentation --Qiita: Developer, but here are some points to note.

As for the impression, I thought that it was a simple feeling, very similar to GitHub API v3.

General

--Rate Limit is 60 times / h if unauthenticated, 1000 times / h if authenticated --Pagination per_page has an initial value of 20 and an upper limit of 100 --For authentication, it is easy to issue an access token on the screen after login and include it in the Authorization field of the request header.

How to make an access token

You can publish from the settings screen after logging in to Qiita. You can hit the API as an authenticated user just by including this in the request header.

You can choose from several token permissions, but if you don't have GET, read_qiita will suffice.

Sample response header

The following is the case when https://qiita.com/api/v2/authenticated_user/items (list of posts of authenticated users) is GET, but the response header looks like the following.

{
  "Rate-Reset": "1500863004",
  "X-XSS-Protection": "1; mode=block",
  "X-Content-Type-Options": "nosniff",
  "Rate-Remaining": "989",
  "transfer-encoding": "chunked",
  "Total-Count": "8",
  "Vary": "Origin",
  "X-Request-Id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "Rate-Limit": "1000",
  "Server": "nginx",
  "Connection": "keep-alive",
  "X-Runtime": "0.431045",
  "ETag": "W/\"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"",
  "Link": "<https://qiita.com/api/v2/authenticated_user/items?page=1&per_page=100>; rel=\"first\", <https://qiita.com/api/v2/authenticated_user/items?page=1&per_page=100>; rel=\"last\"",
  "Cache-Control": "max-age=0, private, must-revalidate",
  "Date": "Mon, 24 Jul 2017 01:53:18 GMT",
  "X-Frame-Options": "SAMEORIGIN",
  "Content-Type": "application/json; charset=utf-8"
}

There are various things, but I think the following values are often seen.

--Rate-Limit: Maximum number of requests. 1000 times here. --Rate-Remaining: How many more requests can I make? Here 989 times (already requested 11 times) --Rate-Reset: Time for Rate-Remaining to recover. Epoch time. Here 1500863004 = Monday, July 24, 2017 11:23:24 --Link: Pagination information such as "What page are you getting now?" Or "What is the URL to take the next / previous page" when you cannot get all the records at once. --Total-Count: The number of acquired data. 8 cases here.

Script implementation

I just wanted to back up my Qiita articles, so I wrote a script that "gets all the articles I posted".

The deliverable was uploaded to GitHub. Windows + Python 2.7. It's for myself, so it's a little omission. For example, since it does not support Pagination, you cannot get more than 100 cases.

Full script

It turned out to be something like this. I also uploaded it to GitHub.

# -*- coding: utf-8 -*-

import json
import os
import sys

import requests

def abort(msg):
    print 'Error!: {0}'.format(msg)
    sys.exit(1)

def ustr2filename(ustr):
    """Convert to a valid file name on Windows. """

    ret = ustr

    #Exclude characters that cannot be used as file names.
    invalid_chars = u'\\/:*?"<>|'
    for invalid_char in invalid_chars:
        ret = ret.replace(invalid_char, u'')

    #Encode with terminal encoding.
    ret = ret.encode(sys.stdout.encoding)

    return ret

def get(url, params, headers):
    r = requests.get(url, params=params, proxies=proxies, headers=headers)
    return r

def post(url, data_dict, headers_dict):
    r = requests.post(url, data=json.dumps(data_dict),
                      proxies=proxies, headers=headers_dict)
    return r

def print_response(r, title=''):
    c = r.status_code
    h = r.headers
    print '{0} Response={1}, Detail={2}'.format(title, c, h)

def assert_response(r, title=''):
    c = r.status_code
    h = r.headers
    if c<200 or c>299:
        abort('{0} Response={1}, Detail={2}'.format(title, c, h))

class Article:
    def __init__(self, d):
        self._title      = d['title']
        self._html_body  = d['rendered_body']
        self._md_body    = d['body']
        self._tags       = d['tags']
        self._created_at = d['created_at']
        self._updated_at = d['updated_at']
        self._url        = d['url']

        user = d['user']
        self._userid   = user['id']
        self._username = user['name']

    def save_as_markdown(self):

        title = ustr2filename(self._title)
        body  = self._md_body.encode('utf8')

        filename = '{0}.md'.format(title)
        fullpath = os.path.join(MYDIR, filename)
        with open(fullpath, 'w') as f:
            f.write(body)

    def list2file(filepath, ls):
        with open(filepath, 'w') as f:
            f.writelines(['%s\n' % line for line in ls] )

MYDIR = os.path.abspath(os.path.dirname(__file__))

proxies = {
    "http": os.getenv('HTTP_PROXY'),
    "https": os.getenv('HTTPS_PROXY'),
}
token = os.getenv('QIITA_ACCESS_TOKEN')
headers = {
    'content-type'  : 'application/json',
    'charset'       : 'utf-8',
    'Authorization' : 'Bearer {0}'.format(token)
}

#List of posts by authenticated users
url = 'https://qiita.com/api/v2/authenticated_user/items'
params = {
    'page'     : 1,
    'per_page' : 100,
}
r = get(url, params, headers)
assert_response(r)
print_response(r)

items = r.json()
print '{0} entries.'.format(len(items))
for i,item in enumerate(items):
    print '[{0}/{1}] saving...'.format(i+1, len(items))
    article = Article(item)
article.save_as_markdown()

Qiita API used

List of posts by authenticated users-Qiita API v2 documentation https://qiita.com/api/v2/docs#get I used -apiv2authenticated_useritems.

When you do this, the post data will be returned as an array. For details of the posted data, see Posted by User --Qiita API v2 Document. Are the following important?

--rendered_body: HTML body --body: Markdown format manuscript --title: Post title

This time I decided to save it as the value of body.

Language and library used

I used the Python 2.7 requests library. As HTTP for Humans., It's very easy to use.

For Python, I know 3.x is the mainstream, but I'm used to it so I used 2.7: sweat:

Preservation method

It can't be helped if it's too elaborate, so I made it simple as follows.

--1 post = 1 save as a file --The file name is (article title) .md --Save to the same directory as the script

However, on Windows, there are some caveats that are reflected in the script.

--Since the encoding of the Windows terminal (command prompt) is cp932, if you use UTF-8 as the file name for the Japanese title, the characters may be garbled. → Get the terminal encoding with sys.stdout.encoding and get the terminal encoding. Convert accordingly and save --There are some half-width symbols that cannot be used as file names on Windows → Remove

in conclusion

I think I understand the minimum usage, so I would like to play around with other things. A related article has already been posted on Qiita, so I will read it as well.

-Overview of Qiita API v2 --Qiita -[Memo] Qiita: Get the number of views of posted articles from the command line --Qiita

By the way, the Rate Limit of Qiita API v2 is 1000 times / h, so it is a calculation that can be used about 16 times in 1 minute. It will be okay if you don't abuse it.

I wrote a Python script that exports all my posts using the Qiita API v2