I wrote a Python script that exports all my posts using the Qiita API v2

As a practice of Qiita API v2, I wrote a script to export all the articles I posted. I will summarize it after studying.

What I learned this time

--Get data with Qiita API v2 --Using an access token with Qiita API v2 --Save the GET data to a file --Basic usage of requests library

About Qiita API v2

For details, see Qiita API v2 documentation --Qiita: Developer, but here are some points to note.

As for the impression, I thought that it was a simple feeling, very similar to GitHub API v3.

General

--Rate Limit is 60 times / h if unauthenticated, 1000 times / h if authenticated --Pagination per_page has an initial value of 20 and an upper limit of 100 --For authentication, it is easy to issue an access token on the screen after login and include it in the Authorization field of the request header.

How to make an access token

You can publish from the settings screen after logging in to Qiita. You can hit the API as an authenticated user just by including this in the request header.

You can choose from several token permissions, but if you don't have GET, read_qiita will suffice.

Sample response header

The following is the case when https://qiita.com/api/v2/authenticated_user/items (list of posts of authenticated users) is GET, but the response header looks like the following.

{
  "Rate-Reset": "1500863004",
  "X-XSS-Protection": "1; mode=block",
  "X-Content-Type-Options": "nosniff",
  "Rate-Remaining": "989",
  "transfer-encoding": "chunked",
  "Total-Count": "8",
  "Vary": "Origin",
  "X-Request-Id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "Rate-Limit": "1000",
  "Server": "nginx",
  "Connection": "keep-alive",
  "X-Runtime": "0.431045",
  "ETag": "W/\"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"",
  "Link": "<https://qiita.com/api/v2/authenticated_user/items?page=1&per_page=100>; rel=\"first\", <https://qiita.com/api/v2/authenticated_user/items?page=1&per_page=100>; rel=\"last\"",
  "Cache-Control": "max-age=0, private, must-revalidate",
  "Date": "Mon, 24 Jul 2017 01:53:18 GMT",
  "X-Frame-Options": "SAMEORIGIN",
  "Content-Type": "application/json; charset=utf-8"
}

There are various things, but I think the following values are often seen.

--Rate-Limit: Maximum number of requests. 1000 times here. --Rate-Remaining: How many more requests can I make? Here 989 times (already requested 11 times) --Rate-Reset: Time for Rate-Remaining to recover. Epoch time. Here 1500863004 = Monday, July 24, 2017 11:23:24 --Link: Pagination information such as "What page are you getting now?" Or "What is the URL to take the next / previous page" when you cannot get all the records at once. --Total-Count: The number of acquired data. 8 cases here.

Script implementation

I just wanted to back up my Qiita articles, so I wrote a script that "gets all the articles I posted".

The deliverable was uploaded to GitHub. Windows + Python 2.7. It's for myself, so it's a little omission. For example, since it does not support Pagination, you cannot get more than 100 cases.

Full script

It turned out to be something like this. I also uploaded it to GitHub.

# -*- coding: utf-8 -*-

import json
import os
import sys

import requests

def abort(msg):
    print 'Error!: {0}'.format(msg)
    sys.exit(1)

def ustr2filename(ustr):
    """Convert to a valid file name on Windows. """

    ret = ustr

    #Exclude characters that cannot be used as file names.
    invalid_chars = u'\\/:*?"<>|'
    for invalid_char in invalid_chars:
        ret = ret.replace(invalid_char, u'')

    #Encode with terminal encoding.
    ret = ret.encode(sys.stdout.encoding)

    return ret

def get(url, params, headers):
    r = requests.get(url, params=params, proxies=proxies, headers=headers)
    return r

def post(url, data_dict, headers_dict):
    r = requests.post(url, data=json.dumps(data_dict),
                      proxies=proxies, headers=headers_dict)
    return r

def print_response(r, title=''):
    c = r.status_code
    h = r.headers
    print '{0} Response={1}, Detail={2}'.format(title, c, h)

def assert_response(r, title=''):
    c = r.status_code
    h = r.headers
    if c<200 or c>299:
        abort('{0} Response={1}, Detail={2}'.format(title, c, h))

class Article:
    def __init__(self, d):
        self._title      = d['title']
        self._html_body  = d['rendered_body']
        self._md_body    = d['body']
        self._tags       = d['tags']
        self._created_at = d['created_at']
        self._updated_at = d['updated_at']
        self._url        = d['url']

        user = d['user']
        self._userid   = user['id']
        self._username = user['name']

    def save_as_markdown(self):

        title = ustr2filename(self._title)
        body  = self._md_body.encode('utf8')

        filename = '{0}.md'.format(title)
        fullpath = os.path.join(MYDIR, filename)
        with open(fullpath, 'w') as f:
            f.write(body)

    def list2file(filepath, ls):
        with open(filepath, 'w') as f:
            f.writelines(['%s\n' % line for line in ls] )

MYDIR = os.path.abspath(os.path.dirname(__file__))

proxies = {
    "http": os.getenv('HTTP_PROXY'),
    "https": os.getenv('HTTPS_PROXY'),
}
token = os.getenv('QIITA_ACCESS_TOKEN')
headers = {
    'content-type'  : 'application/json',
    'charset'       : 'utf-8',
    'Authorization' : 'Bearer {0}'.format(token)
}

#List of posts by authenticated users
url = 'https://qiita.com/api/v2/authenticated_user/items'
params = {
    'page'     : 1,
    'per_page' : 100,
}
r = get(url, params, headers)
assert_response(r)
print_response(r)

items = r.json()
print '{0} entries.'.format(len(items))
for i,item in enumerate(items):
    print '[{0}/{1}] saving...'.format(i+1, len(items))
    article = Article(item)
article.save_as_markdown()

Qiita API used

List of posts by authenticated users-Qiita API v2 documentation https://qiita.com/api/v2/docs#get I used -apiv2authenticated_useritems.

When you do this, the post data will be returned as an array. For details of the posted data, see Posted by User --Qiita API v2 Document. Are the following important?

--rendered_body: HTML body --body: Markdown format manuscript --title: Post title

This time I decided to save it as the value of body.

Language and library used

I used the Python 2.7 requests library. As HTTP for Humans., It's very easy to use.

For Python, I know 3.x is the mainstream, but I'm used to it so I used 2.7: sweat:

Preservation method

It can't be helped if it's too elaborate, so I made it simple as follows.

--1 post = 1 save as a file --The file name is (article title) .md --Save to the same directory as the script

However, on Windows, there are some caveats that are reflected in the script.

--Since the encoding of the Windows terminal (command prompt) is cp932, if you use UTF-8 as the file name for the Japanese title, the characters may be garbled. → Get the terminal encoding with sys.stdout.encoding and get the terminal encoding. Convert accordingly and save --There are some half-width symbols that cannot be used as file names on Windows → Remove

in conclusion

I think I understand the minimum usage, so I would like to play around with other things. A related article has already been posted on Qiita, so I will read it as well.

-Overview of Qiita API v2 --Qiita -[Memo] Qiita: Get the number of views of posted articles from the command line --Qiita

By the way, the Rate Limit of Qiita API v2 is 1000 times / h, so it is a calculation that can be used about 16 times in 1 minute. It will be okay if you don't abuse it.

Recommended Posts

I wrote a Python script that exports all my posts using the Qiita API v2
Creating a Python script that supports the e-Stat API (ver.2)
I wrote a script that splits the image in two
[Python] I wrote a REST API using AWS API Gateway and Lambda.
Created a Python wrapper for the Qiita API
A memo that I wrote a quicksort in Python
I made a Python Qiita API wrapper "qiipy"
I made a LINE BOT that returns a terrorist image using the Flickr API
I wrote a script to revive the gulp watch that will die soon
Convert the cURL API to a Python script (using IBM Cloud object storage)
I made a script to record the active window using win32gui of Python
A memo that I touched the Datastore with python
A little bit from Python using the Jenkins API
A story that was convenient when I tried using the python ip address module
I replaced the Windows PowerShell cookbook with a python script.
I built an application with Lambda that notifies LINE of "likes" using the Qiita API
I touched the Qiita API
I wrote a script to combine the divided ts files
Miscellaneous notes that I tried using python for the matter
[Python] I tried collecting data using the API of wikipedia
I made a Chatbot using LINE Messaging API and Python
A Python script that compares the contents of two directories
[Python] I tried to make a simple program that works on the command line using argparse.
I created a Python library to call the LINE WORKS API
I wrote a function to load a Git extension script in Python
[Ev3dev] Create a program that captures the LCD (screen) using python
I wrote a script to extract a web page link in Python
A script that returns 0, 1 attached to the first Python prime number
I made a Chatbot using LINE Messaging API and Python (2) ~ Server ~
I made a Line-bot using Python!
I wrote the queue in Python
I tried using the checkio API
[Python] I tried using YOLO v3
I wrote the stack in Python
I made my own research tool using the legal API [Smart Roppo]
A story that visualizes the present of Qiita with Qiita API + Elasticsearch + Kibana
A Python script that automatically collects typical images using bing image search
[Python] I wrote a simple code that automatically generates AA (ASCII art)
AtCoder writer I wrote a script to aggregate the contests for each writer
I tried to get the authentication code of Qiita API with Python.
I wrote a corpus reader that reads the results of MeCab analysis
[IOS] I made a widget that displays Qiita trends in Pythonista3. [Python]
I wrote FizzBuzz in python using a support vector machine (library LIVSVM).
Try using the Wunderlist API in Python
A class that hits the DMM API
Try using the Kraken API in Python
I tried using YOUTUBE Data API V3
Tweet using the Twitter API in Python
Run the Python interpreter in a script
I tried using UnityCloudBuild API from Python
[Python beginner] I collected the articles I wrote
A program that plays rock-paper-scissors using Python
Implemented Python wrapper for Qiita API v2
[Python] A program that rounds the score
I tried using the BigQuery Storage API
Let Python measure the average score of a page using the PageSpeed Insights API
I took a quick look at the fractions package that handles Python built-in fractions.
A Python script that reads a SQL file, executes BigQuery and saves the csv
I tried using the Python library "pykakasi" that can convert kanji to romaji.
A python script that draws a band diagram from the VASP output file EIGENVAL
Get a list of articles posted by users with Python 3 Qiita API v2