This article is the 20th day article of All About Group (All About Co., Ltd.) Advent Calendar 2019. It's Christmas when I sleep a few more times. I believe that Santa Claus will really come now as an adult.
Nice to meet you for the first time, this is Akidukin14. I am usually in charge of machine learning of All About's advertisement distribution system and visualization of analysis results.
This is my first time writing an Advent Calendar, so I'll try what I have come up with.
Let's visualize the article data inside each calendar of Qiita Advent Calender (Only for articles posted on Qiita ...)
--Background --What you did (bulleted list) --Result --What you did (details) --Extract the article text of the Advent calendar --Process the article text in natural language (separate writing) --Visualization with WordCloud --Reference --Advent Calender Index
(As of 12 / n 2019) ** Number of calendars: 770 ** ** Number of participants: 13,414 **
This is my first time to participate in Advent Calender, but the first thing I thought about was.
Pane. The number of calendars. Participant Pane.
Where is the calendar that you are interested in pinpointing ... ?? I thought that was my motive.
I will post the result first. If the result is enough, please check here. (I've done so much light, so I think the results are biased, I'm sorry ...)
call_qiita_api
def call_qiita_api(header, per_page = None, query = None, page = None):
##api designation
get_items_api = 'https://qiita.com/api/v2/items'
params = {'per_page' : per_page
, 'query' : query
, 'page' : page}
datas = requests.get(get_items_api, params = params, headers = header)
return datas
regs_body_text
###The code is really dirty...
def regs_body_text(text):
##Normalization pattern
reg_pattern = re.compile('(\n|\t| | |-|~|-|`|:|;|_|\*|\!|\?|!|?|\+|\$|#|\[|\])')
tmp = re.sub(reg_pattern, '', text.lower())
target_type = re.compile('(Advent calendar|adventcalendar)')
if not re.search(target_type, tmp):
return None, None
calender_type = re.compile('(This article)\w+?(adventcalendar2019)')
if not re.search(calender_type, tmp):
return None, None,
url_strings = re.search(calender_type, tmp).group()
get_calender_type = re.sub('(This article|adventcalendar2019)', '', url_strings)
return get_calender_type, tmp
parse_text
mecab = MeCab.Tagger('-Owakati')
mecab.parse('')
def parse_text(text, parser = mecab):
part = ['noun','動noun']
parsed_text = []
t = parser.parseToNode(text)
while t:
parts = t.feature.split(',')
if parts[0] in part:
parsed_text.append(t.surface)
t = t.next
return parsed_text
make_wordcloud
##Visualize with WordCloud
keys_len = len(dataset.keys())
plot_picture = int(keys_len / 9) + 1
plot_area = np.arange(0,9,1).reshape(3,3)
keys = sorted(dataset.keys())
fp = FontProperties(fname = fonts)
k = 0
for pp in range(plot_picture):
fig,axes = plt.subplots(nrows = 3, ncols = 3, figsize = (10,10))
for i in range(9):
sys.stdout.write('\r {}/{}'.format(k, keys_len))
target_key = keys[k]
wc = wordcloud.WordCloud(
font_path = fonts
, prefer_horizontal = 1
, max_words = 300
, background_color = 'white'
, colormap = 'RdYlBu'
, contour_color='pink'
, width = 750
, height = 750)
n,m = [x.item() for x in np.where(plot_area == i)]
plot_data = ' '.join([y for x in dataset[target_key]['parsed_text'] for y in x if not check_word(y)])
wc_gen = wc.generate(plot_data)
axes[n,m].imshow(wc_gen, interpolation = 'bilinear')
axes[n,m].set_title('AdventCalendar : {}'.format(target_key), FontProperties = fp, color = 'gray', fontsize = 10)
axes[n,m].axis('off')
k += 1
plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)
plt.savefig('{}_wordcloud.png'.format(pp))
plt.close()
Using Qiita API: https://qiita.com/arai-qiita/items/94902fc0e686e59cb8c5
This is the type of calendar I used this time. I prepared it as an index.
Image No | AdventCalenderNo | Advent Calender name |
---|---|---|
1 | 0 | 1on1 |
1 | 1 | 2019 new graduate engineer |
1 | 2 | 3dsensor |
1 | 3 | access |
1 | 4 | airccar |
1 | 5 | aizu |
1 | 6 | akerun |
1 | 7 | alh |
1 | 8 | alibabacloud |
2 | 9 | amazoneks |
2 | 10 | amazoneks2 |
2 | 11 | android |
2 | 12 | android2 |
2 | 13 | android for beginners |
2 | 14 | angular |
2 | 15 | angular2 |
2 | 16 | ansible |
2 | 17 | ansible2 |
3 | 18 | appsscript |
3 | 19 | arduino |
3 | 20 | asoview |
3 | 21 | aws |
3 | 22 | awsamplify |
3 | 23 | aws lambda and serverless1 |
3 | 24 | aws beginner |
3 | 25 | azure |
3 | 26 | bitrise |
4 | 27 | blockchain |
4 | 28 | bosyu |
4 | 29 | brainpad |
4 | 30 | c |
4 | 31 | cakephp |
4 | 32 | calendargmo ad marketing |
4 | 33 | camphor |
4 | 34 | cbcloud |
4 | 35 | circleci |
5 | 36 | classi |
5 | 37 | clojure |
5 | 38 | codebaseokinawa |
5 | 39 | conoha |
5 | 40 | css |
5 | 41 | cyberagent20 new graduate |
5 | 42 | cyberagentdevelopers |
5 | 43 | dart |
5 | 44 | datadog |
6 | 45 | dena |
6 | 46 | advent calendar by dena20 graduate candidate engineer dena20 new graduate |
6 | 47 | dena20 new graduate |
6 | 48 | deno |
6 | 49 | discord |
6 | 50 | diverse |
6 | 51 | django |
6 | 52 | dmm group |
6 | 53 | dotfiles |
7 | 54 | dsl |
7 | 55 | dtp |
7 | 56 | eccube |
7 | 57 | elasticstack |
7 | 58 | elixir |
7 | 59 | elm |
7 | 60 | elm2 |
7 | 61 | emacs |
7 | 62 | enebular |
8 | 63 | engineeringmanager |
8 | 64 | Talk about ethercat |
8 | 65 | filemaker |
8 | 66 | firebase |
8 | 67 | flutter |
8 | 68 | flutter2 |
8 | 69 | fork |
8 | 70 | foss4g |
8 | 71 | People involved in freee data |
9 | 72 | fun |
9 | 73 | fusic |
9 | 74 | fusic part 2 |
9 | 75 | git |
9 | 76 | globis |
9 | 77 | gmo pepabo |
9 | 78 | go |
9 | 79 | go3 |
9 | 80 | go4 |
10 | 81 | go5 |
10 | 82 | go6 |
10 | 83 | go7 |
10 | 84 | goodpatch |
10 | 85 | hamee |
10 | 86 | haskell |
10 | 87 | heroku |
10 | 88 | houdiniapprentice |
10 | 89 | hrtech |
11 | 90 | ios2 |
11 | 91 | iotlt |
11 | 92 | iplug |
11 | 93 | ipv6 |
11 | 94 | The second piece of iq1 |
11 | 95 | iridge |
11 | 96 | jamstack |
11 | 97 | java |
11 | 98 | javascript |
12 | 99 | javascript2 |
12 | 100 | kaggle |
12 | 101 | kayac |
12 | 102 | kintone |
12 | 103 | kintone2 |
12 | 104 | klab |
12 | 105 | klabengineer |
12 | 106 | kubernetes |
12 | 107 | kubernetes2 |
13 | 108 | kubernetes3 |
13 | 109 | kyash |
13 | 110 | kyotouniversity |
13 | 111 | laravel |
13 | 112 | laravel2 |
13 | 113 | libreoffice |
13 | 114 | lifull |
13 | 115 | lifull part 3 |
13 | 116 | makeit |
14 | 117 | maya |
14 | 118 | microad |
14 | 119 | microsoftazuretech |
14 | 120 | microsoftpowerbi |
14 | 121 | misoca yayoi |
14 | 122 | mohikanz |
14 | 123 | mysql |
14 | 124 | ncc |
14 | 125 | nem |
15 | 126 | nervesjp |
15 | 127 | nestjs |
15 | 128 | newspicks |
15 | 129 | nijibox |
15 | 130 | nodered |
15 | 131 | northdetail |
15 | 132 | ntt communications |
15 | 133 | ntt techno cross |
15 | 134 | n high school |
16 | 135 | obniz |
16 | 136 | office365 |
16 | 137 | oicitcreateclub |
16 | 138 | openandreproduciblescience |
16 | 139 | opensaasstudio |
16 | 140 | opttechnologies |
16 | 141 | oraclecloudinfrastructure |
16 | 142 | othlotech |
16 | 143 | pandoc |
17 | 144 | pathee |
17 | 145 | perl |
17 | 146 | php |
17 | 147 | plaid |
17 | 148 | ponos |
17 | 149 | pwa |
17 | 150 | pyladiesjapan |
17 | 151 | python |
17 | 152 | python part 3 |
18 | 153 | qiitagithubactions |
18 | 154 | qt |
18 | 155 | qualiarts |
18 | 156 | r |
18 | 157 | react |
18 | 158 | react2 |
18 | 159 | reactnative |
18 | 160 | retty |
18 | 161 | rpa |
19 | 162 | ruby |
19 | 163 | runteq |
19 | 164 | rust |
19 | 165 | rust part 2 |
19 | 166 | rust part 3 |
19 | 167 | salesforceplatform |
19 | 168 | sansan |
19 | 169 | sap |
19 | 170 | satysfi |
20 | 171 | sbai |
20 | 172 | scala |
20 | 173 | sensy |
20 | 174 | sfc |
20 | 175 | sfcrg |
20 | 176 | siv3d |
20 | 177 | slack |
20 | 178 | smarthr |
20 | 179 | snowrobin |
21 | 180 | soracom |
21 | 181 | speee |
21 | 182 | splunk |
21 | 183 | sra |
21 | 184 | sre |
21 | 185 | studioztech |
21 | 186 | swift |
21 | 187 | terraform |
21 | 188 | tjbot |
22 | 189 | tokyocityuniversity |
22 | 190 | tomowarkar alone |
22 | 191 | typescript |
22 | 192 | unity |
22 | 193 | unity2 |
22 | 194 | unity3 |
22 | 195 | valu |
22 | 196 | vexperts |
22 | 197 | vim |
23 | 198 | vim2 |
23 | 199 | visualstudiocode |
23 | 200 | vrchat |
23 | 201 | vtubertech1 |
23 | 202 | vue2 |
23 | 203 | wanogroup |
23 | 204 | wano group |
23 | 205 | webgl |
23 | 206 | workflow |
24 | 207 | xamarin |
24 | 208 | yamap engineer |
24 | 209 | zeals |
24 | 210 | zlab |
24 | 211 | zozo technologies |
24 | 212 | zozo Technologies 1 |
24 | 213 | zozo Technologies 2 |
24 | 214 | zozo Technologies 3 |
24 | 215 | zozo Technologies 4 |
25 | 216 | zozo Technologies 5 |
25 | 217 | Uluru |
25 | 218 | Kufu Company |
25 | 219 | Sakura Internet |
25 | 220 | Just a group |
25 | 221 | Anything for the time being |
25 | 222 | Engineer who wants to spread something |
25 | 223 | Looking back |
25 | 224 | Puri Puri Appliance |
26 | 225 | Iridge |
26 | 226 | Aso View |
26 | 227 | Inception deck |
26 | 228 | Willgate |
26 | 229 | Web crew |
26 | 230 | M3 Career |
26 | 231 | AP Communications |
26 | 232 | Keyboard 1 |
26 | 233 | Giftee |
27 | 234 | Fucking app |
27 | 235 | Fucking app 2 |
27 | 236 | Crowdworks |
27 | 237 | Shader advent calendar |
27 | 238 | Ciscosystemsjapan by Cisco volunteers |
27 | 239 | Japan system |
27 | 240 | G's Academy |
27 | 241 | Smart speaker |
27 | 242 | Software testing |
28 | 243 | Software test tips |
28 | 244 | Dip |
28 | 245 | About Data Science by Datamix Community |
28 | 246 | Data structures and algorithms |
28 | 247 | Toreta |
28 | 248 | Domain Driven Design 1 |
28 | 249 | Dwango |
28 | 250 | Nifty Group |
28 | 251 | Non-Pro Lab |
29 | 252 | Hands Lab |
29 | 253 | Fenrir Design and Technology |
29 | 254 | Photocreate |
29 | 255 | Future |
29 | 256 | Future 2 |
29 | 257 | Fuller |
29 | 258 | Mynavi |
29 | 259 | Mixi 20 new graduate |
29 | 260 | Mixi group |
30 | 261 | Motivation cloud series |
30 | 262 | Your Meister |
30 | 263 | Unique Vision Co., Ltd. |
30 | 264 | Lux |
30 | 265 | Ray tracing |
30 | 266 | Personal development |
30 | 267 | thousand |
30 | 268 | Kure National College of Technology |
30 | 269 | Shinagawa |
31 | 270 | Muroran Institute of Technology Data Science Laboratory dsl |
31 | 271 | Miyazaki it related study session |
31 | 272 | Fujitsu Cloud Technologies |
31 | 273 | Attorney dot com |
31 | 274 | Access Co., Ltd. |
31 | 275 | Amazonai by Knowledgecom operated by Knowledge Communication Co., Ltd. |
31 | 276 | How did you learn machine learning by Nikkei xtech Business ai② |
31 | 277 | Delve into machine learning tools by Nikkei xtech Business ai③ |
31 | 278 | Spring source club |
32 | 279 | Fukuoka young sierbc |
32 | 280 | Second Dwango |
32 | 281 | Self-made os |
32 | 282 | Natural language processing |
32 | 283 | Natural language processing 2 |
32 | 284 | Ibaraki University |
32 | 285 | Certification Authorization technology |
32 | 286 | Kinki University |
32 | 287 | Suzuka National College of Technology |
Recommended Posts