When downloading an app, many people first refer to the reviews posted on the app. But there are so many reviews that it's difficult to see them all. So, this time, I would like to use the expression ** wordcloud ** to visualize the app reviews so that you can see them at a glance.
By expressing the words that appear in a document in various sizes and colors, the features of the document are visualized in a single image.
App reviews on the App Store can be obtained in JSON format by entering the ID of the corresponding app at the URL below. https://itunes.apple.com/jp/rss/customerreviews/id=(アプリID)/page=1/json This time, we will target the Twitter app. The ID is 333903271. You can get up to 10 pages by changing the number after page =.
Get the app review data with the script below.
import pandas as pd
import requests
import json
rss_url = 'https://itunes.apple.com/jp/rss/customerreviews/id={}/sortBy=mostRecent/page={}/json'
app_id = '333903271'
def get_reviews(url):
"""
From API response of iOS review acquisition[Score, title, text, name]Get a list of
"""
response = requests.get(url, timeout=3.5)
response_json = json.loads(response.text)
reviews = [[int(entry['im:rating']['label']), entry['title']['label'], entry['content']['label'], entry['author']['name']['label']]\
for entry in response_json['feed']['entry']]
return reviews
review_list = []
# 1~Collect 10 pages of reviews
for i in range(1, 11):
page_url = rss_url.format(app_id, i)
reviews = get_reviews(page_url)
review_list += reviews
review_df = pd.DataFrame(review_list, columns=['point', 'title', 'review', 'name'])![twitter_wordcloud.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/501910/8c3531e3-7cde-a60f-8d5d-d8acd8594c6b.png)
The collected data looks like this.
Create a wordcloud image based on the collected review data. First, install MeCab for morphological analysis.
$ brew install mecab
$ brew install mecab-ipadic
Then install the wordcloud library with python.
$ pip install wordcloud
You can create a wordcloud with the script below. For Japanese input, specify the path of the Japanese font file in the execution environment.
#Specify the path where the Japanese font is stored
FONT_PATH = '/System/Library/Fonts//Hiragino Horn Gothic W3.ttc'
def prepare_word_list(words):
"""
Create a string for input in wodcloud
Args:
words([str]):List of sentences
Retruns:
str:A character string that extracts only the specified part of speech from all words and combines them separated by spaces.
"""
m = MeCab.Tagger('')
parsed_words = []
for word in words:
items = [x.split('\t') for x in m.parse(word).splitlines()]
for item in items:
if item[0] == 'EOS' or item[0] == '':
pass
elif item[1].split(',')[0] in ["noun", "adjective", "verb"]:
parsed_words.append(item[0])
return ' '.join(parsed_words)
def make_wordcloud(words, file_name):
"""
Create a wordcloud image file from the entered text.
Args:
words(str):A string of words separated by spaces
file_name(str):Image file output destination path
Returns:
None
"""
parsed_words = prepare_word_list(words)
wordc = wordcloud.WordCloud(
font_path = FONT_PATH,
background_color='white',
contour_width=2,
width=800,
height=600,
).generate(parsed_words)
wordc.to_file(file_name)
make_wordcloud(review_df['review'], './image/twitter_wordcloud.png')
Here is the completed wordcloud image!
Twitter, account, freeze, follow, timeline, etc. are lined up with twitter-like words.
I was able to visualize the reviews of the ios app. Wordcloud may be good when you want to quickly capture images of a large number of documents.