What about this article?

In the previous article [^ 1], I visualized Qiita's popular tags on a monthly Bar Chart Race, so I will post the procedure.

1. Get information using Qiita API

As mentioned in the previous article, we basically borrow the wisdom of our predecessors [^ 2].

This method retrieves articles written within a half-month and tries to aggregate all periods by shifting the periods. But,

query = "&query=created:>" + start_date  + "+created:<" + end_date

To start_date = ["2018-01-15","2018-01-31",...] end_date = ["2018-01-31","2018-02-15",...]

Because it is, the boundary is not included. Therefore, I did the following.

query = "&query=created:>" + start_date  + "+created:<=" + end_date

2. Process using pandas

As follows. See comments for details.

import datetime
from dateutil.relativedelta import relativedelta
import copy

# 1.Load all result files created in
df_all = pd.read_csv("results/summary.csv")

#Start date and time
ref_date = datetime.date(2011,9,1)

# created_Sort by at
df_all = df_all.sort_values("created_at")

#Extract only tag information and date information
tags_list = list(df_all["tags_str"])
date_list = list(df_all["created_at"])
#Convert to a type that can use relativedelta etc.
date_list = [pd.to_datetime(one) for one in date_list]

# key:Tag name, value:Number of times
tags_dict =dict()
#Updated every time the first year (2011) and the year to be aggregated change
y = date_list[0].year
#First month(9), Updated every time the month to be aggregated changes
m = date_list[0].month
#For storing results
ref_date = datetime.date(y,m,1)

#List for storing intermediate results (sum) in each month
monthly_result = []
#Monthly storage list
month = []

for i,(one_tags, one_date) in tqdm(enumerate(zip(tags_list,date_list))):
    try:
        #List comma-separated text
        tags = one_tags.split(",")
    except AttributeError:
        #Sometimes NaN is included, so at that time continue (when tag is not set?)
        continue
    # tags_If you look at the dict and the tag is already in+1, otherwise register in dict and store 1
    for one_tag in tags:
        try:
            tags_dict[one_tag] += 1
        except KeyError:
            tags_dict[one_tag] = 1
    
    #Processing when the month changes
    if one_date.year == y and one_date.month == m:
        continue
    else:
        # month, monthly_Store the date at that time and the dict up to that point in result
        month.append(ref_date)
        monthly_result.append(copy.deepcopy(tags_dict))
        ref_date += relativedelta(months=1)
        y = ref_date.year
        m = ref_date.month
    
#Store last state on exit
month.append(ref_date)
monthly_result.append(copy.deepcopy(tags_dict))

#For each month's dict, register tags that have not been posted by that month in the dict and store 0
for one in monthly_result:
    ref_keys = one.keys()
    for one_tag in tags_dict:
        if not one_tag in ref_keys:
            one[one_tag] = 0
    
#Molding
monthly_result_num = []
for one_dict in monthly_result:
    #From dict to list to sort
    tmp_list = [one for one in one_dict.items()]
    #Sort by name
    tmp_list = sorted(tmp_list, key=lambda x:x[0])
    #Store only the number of times
    monthly_result_num.append([one[1] for one in tmp_list ])
    
#Temporarily store the tag name in the value of DataFrame
df_align = pd.DataFrame({"tags":sorted(ref_keys)})
#Store the cumulative value of the number of tag registrations up to each month in the DataFrame
for one_date,one_nums in zip(month,monthly_result_num):
    df_align[one_date.strftime("%Y-%m")] = one_nums
#Export to csv with tag name as index
df_align.set_index('tags').to_csv("all_result.csv")

3. Visualize using flourish bar chart race

https://app.flourish.studio/ Upload the csv that came out to the bar chart race of. Now you can visualize it! !!

The story of visualizing popular Qiita tags with Bar Chart Race

What about this article?

1. Get information using Qiita API

2. Process using pandas

3. Visualize using flourish bar chart race