Somehow, I wanted to analyze Qiita's article data, so I touched the API. You don't need to authenticate this time because you only need to get the article information.
I tried two big things.
I will explain in order.
Load the library.
import numpy as np
import pandas as pd
import requests
import json
from pandas.io.json import json_normalize
The `` `json_normalize``` at the bottom is a convenient one that formats the json format data returned by the API into the pandas data frame format.
For the sample Qiita API v2 documentation,
GET /api/v2/users?page=1&per_page=It says 20 etc..
In other words, you can get information by accessing the following URL.
https://qiita.com/api/v2/users?page=1&per_page=20
Here, `` `per_page``` is the number of users to get at one time, and `` `` page``` is the number. For example, if you want 1000 user information, `` `per_page`` You must send at least 10 requests with = 100 (upper limit) ```.
So, the code looks like this:
```python
n = 333 #Number of users you want to get
per_page = 100
df = pd.DataFrame()
for page in range(1, int(n/per_page)+2): #Get a lot
base_url = "https://qiita.com/api/v2/users?page={0}&per_page={1}"
url = base_url.format(page, per_page)
response = requests.get(url)
res = response.json()
tmp_df = json_normalize(res)
df = pd.concat([df, tmp_df])
df.reset_index(drop=True, inplace=True)
df = df.iloc[:n,:] #Delete as much as you get
The result is a data frame like this:
For the sample Qiita API v2 documentation,
GET /api/v2/items?page=1&per_page=20&query=qiita+user%It says 3 Ayaotti etc..
In other words, you can get information by accessing the following URL.
https://qiita.com/api/v2/items?page=1&per_page=20&query=qiita+user%3Ayaotti
Now, a new `` `query``` has appeared, which gives you the same search options as when searching in a browser, where` ``: `` `is ``` `in the URL. Note that it is encoded in the notation% 3A```.
![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/540956/1b6f986d-87c0-8552-0c53-fedf902b54cc.png)
By using this, you can get the information of a specific user with the feeling of ``` query = user% 3A 〇〇〇```.
So the code looks like this:
```python
n = 125 #Number of articles you want to get
user = "yaotti"
per_page = 100
df = pd.DataFrame()
for page in range(1, int(n/per_page)+2): #Get a lot
base_url = "https://qiita.com/api/v2/items?page={0}&per_page={1}&query=user%3A{2}"
url = base_url.format(page, per_page, user)
response = requests.get(url)
res = response.json()
tmp_df = json_normalize(res)
df = pd.concat([df, tmp_df])
df.reset_index(drop=True, inplace=True)
df = df.iloc[:n,:] #Delete as much as you get
The result is a data frame like this:
that's all!
Qiita API v2 documentation Convert dictionary list to DataFrame with pandas json_normalize
Recommended Posts