--We have created a network that connects VTuber channels. --The weight of the edge of the network is how much the viewer of the channel overlaps with another channel. --The offices considered are Nijisanji, hololive, 774 .inc, upd8, Nori Pro, and individuals. (Personal selection is 100% of my hobbies and tastes ...) ――This time, I just visualized it. I haven't analyzed it. There are still many things I want to do, so I will do it when the power pro calms down. ――July 12th is the 3D unveiling of hololive Kakumaki Watame-san. Let's see.
Of the edges that connect the distributors, the network that displays only the edges with the top 10% weight is as follows, and viewers are often seen among distributors belonging to the same office such as Nijisanji and hololive. I found that I was wearing it.
Currently, there are many offices in the VTuber industry such as Nijisanji, hololive, 774 inc., Nori Pro, upd8, etc., and each office has many distributors. Of course, there are also individual distributors who do not belong to the office. The distributor posts videos for about an hour at a pace of about once every day to a few days. If you add up the video time for each office, the total time of videos posted per day can easily exceed 24 hours. Therefore, it is practically impossible to watch all the videos of multiple offices. The following is my registered channel column one day, but it's hard to see everything. .. .. Perhaps many people are in the same situation. Therefore, each person chooses a video and a channel according to their hobbies and tastes. At this time, I thought it would be fun to visualize what channels are easy to watch at the same time.
Many VTubers broadcast live, and you can post comments during the broadcast. For example, if you play the video (https://www.youtube.com/watch?v=Ypc_xKz--fY) of hololive's Luna Himemori (https://www.youtube.com/channel/UCa9Y57gfeY0Zro_noHRVrnw), You can see the following comment section at the time of live broadcasting aside.
This comment log contains comment date, viewer name, spacha information, and more.
This time, we will use viewer name information to evaluate the relationships between channels. Let $ U_i $ be the set of viewers who commented on a channel $ i $, and define how much the viewers suffered between channels $ w_ {ij} $, which is the common ratio of channels $ i $ and $ j
This is used as an edge weight when visualizing as a network.
The data acquisition period is from January 1, 2020 to June 30, 2020. In addition, we have not validated whether the comments of all the videos with comments were obtained correctly. .. ..
There is a data acquisition method in the following article, so I will use it almost as it is.
As an example, I save each video in the following format.
AuthorName | BaseDate | ChannelId | Timestamp | VideoId | VideoLength |
---|---|---|---|---|---|
Moss Max | 2020-05-07 | UC--A2dwZW7-M2kID0N6_lfA | 2020-05-07 19:53:23 | -Alnw7B1GBo | 2953 |
Chocolate cornet | 2020-05-07 | UC--A2dwZW7-M2kID0N6_lfA | 2020-05-07 19:54:58 | -Alnw7B1GBo | 2953 |
Black dog | 2020-05-07 | UC--A2dwZW7-M2kID0N6_lfA | 2020-05-07 19:55:08 | -Alnw7B1GBo | 2953 |
Oguna | 2020-05-07 | UC--A2dwZW7-M2kID0N6_lfA | 2020-05-07 19:55:56 | -Alnw7B1GBo | 2953 |
High tension friday | 2020-05-07 | UC--A2dwZW7-M2kID0N6_lfA | 2020-05-07 19:56:05 | -Alnw7B1GBo | 2953 |
Only VideoLength is texto. .. .. I won't use it this time. .. ..
First, create a list of viewers who commented on the data period on each channel. This can be done by merging all the comment lists obtained above. The code below tries to count the number of comments, but this is not really relevant to this work for the convenience of another work.
df = pd.concat([pd.read_pickle(path) for path in comment_paths])
counts = df.groupby(['AuthorName', 'ChannelId', 'VideoId', 'BaseDate']).size().to_frame('Count').reset_index()
The format is as follows.
AuthorName | ChannelId | VideoId | BaseDate | Count |
---|---|---|---|---|
chro nicle | UCwrjITPwG4q71HzihV2C7Nw | H7wgvBbxo1U | 2020-06-30T00:00:00 | 1 |
Fazias | UChAnqc_AY5_I3Px5dig3X1Q | Q7DS6uaInMA | 2020-06-30T00:00:00 | 26 |
Dream eating | UCuvk5PilcvDECU7dDZhQiEw | 6uiQOEDmD6U | 2020-06-30T00:00:00 | 91 |
Fatin Thifal | UCOmjciHZ8Au3iKMElKXCF_g | ZrFJpafDKVw | 2020-06-30T00:00:00 | 3 |
Snail state of the futon | UC6oDys1BGgBsIC3WhG1BovQ | QHTLzahEiX4 | 2020-06-30T00:00:00 | 1 |
As mentioned earlier, this time we will calculate the viewer's coverage between channels. This can be easily obtained by creating a user list for each channel and performing a set operation.
def corr_by_author_set_union(counts, channels):
corr = pd.DataFrame().assign(Channel=channels).set_index('Channel')
tmp = counts.loc[:, ['ChannelId', 'AuthorName']].drop_duplicates()
channelId_to_set = {ch: set(tmp[tmp.ChannelId == ch].AuthorName) for ch in channels}
for ch1 in channels:
corr[ch1] = [(len(channelId_to_set[ch1] & channelId_to_set[ch2]) / \
len(channelId_to_set[ch1] | channelId_to_set[ch2])) for ch2 in channels]
return corr
Now, let's draw the graph. The code is almost the same as the following site. --I tried to visualize the national surname network at https://datumstudio.jp/blog/networkx
def create_graph(df, threshold=0.5, is_directed=True):
assert set(df.index) == set(df.columns)
#Create a graph
if is_directed:
graph = nx.DiGraph()
else:
graph = nx.Graph()
#Add node
for col in df.columns:
if not graph.has_node(col):
graph.add_node(col)
#Add edge
for a, b in itertools.combinations(df.columns, 2):
if a == b or graph.has_edge(a, b):
continue
val = df.loc[a, b]
if abs(val) < threshold:
continue
graph.add_edge(a, b, weight=val)
return graph
def draw_char_graph(G, fname, edge_cmap=plt.cm.Greys, figsize=(16, 8)):
plt.figure(figsize=figsize)
weights = [G[u][v]['weight'] for u, v, in G.edges()]
pos = nx.spring_layout(G, k=16)
nodes = pos.keys()
colors = list(set([channel_to_color[n] for n in nodes]))
color_to_id = {colors[i]: i for i in range(len(colors))}
angs = np.linspace(0, 2*np.pi, 1+len(colors))
repos = []
rad = 3.5
for ea in angs:
repos.append(np.array([rad*np.cos(ea), rad*np.sin(ea)]))
for ea in pos.keys():
posx = 0
posx = color_to_id[channel_to_color[ea]]
pos[ea] += repos[posx]
nx.draw(G,
pos,
node_color=[channel_to_color[n] for n in G.nodes()],
edge_cmap=edge_cmap,
edge_vmin=-3e4,
width=weights,
with_labels=True,
font_family='Yu Gothic',
font_size=8,
font_color='green')
plt.savefig(fname, dpi=128)
plt.show()
Create and draw a graph using these.
The line thickness corresponds to a high percentage of viewers in common. .. ..
union_corr = corr_by_author_set_union(channels)
#It is difficult to understand if it is ChannelId, so rewrite it to ChannnelName
union_corr = rename_ChannelId_to_ChannelName(union_corr)
graph = create_graph(union_corr, threshold=0, is_directed=False)
draw_char_graph(graph, 'fig/graph_author_union.png', figsize=(16, 16))
――Overall, the direction of Nijisanji is facing, and the percentage of people who are looking at Nijisanji and other offices at the same time is high. --Mr. Shigure Ui and Mr. Tamaki Inuyama have a strong edge not only in the direction of Nijisanji but also in the direction of hololive.
Since the number of displays in the previous graph is too large, consider reducing the number of edges. 10% is a sense. Since only the top 10% is plotted, if a line is drawn here, it can be interpreted that the viewer's coverage between the channels is very high. .. ..
# (Edge th, betweenness_centrality)
pairs = [(90, 0)]
df = union_corr.copy()
for pair in pairs:
th = np.percentile(df.fillna(0).values.ravel(), pair[0])
print(pair, th)
graph = create_graph(df, threshold=th, is_directed=False)
draw_char_graph(graph , 'fig/graph_author_union_{}.png'.format(pair), figsize=(16, 16))
――High common rate of viewers in the same office --Most of Tamaki Inuyama's connections are to hololive, which has a stronger connection to hololive than Nijisanji. --Same as Shigure Ui
Here, only the top 10% of the edges are plotted. As a personal impression, if the connection with the office is weak, the following can be considered.
――The entire office is connected, and the audience is weakly covered. --The connection with the outside of the office is strong, and when the inside of the office is visualized, the connection appears weak.
――I don't understand because the lines overlap too much.
Lower and upper 3 channels of weights connected to each node
--The bottom 3 channels of the weight average of the edges to which the top 3 are connected --The bottom three are the top three channels
index | Mean | kind |
---|---|---|
Azuchi peach | 0.02581 | Nijisanji Japan |
♥ ️♠️ Alice Mononobe ♦ ️♣️ | 0.03546 | Nijisanji Japan |
Gilzaren III Season 2 | 0.04463 | Nijisanji Japan |
Akina Saegusa/ Saegusa Akina | 0.13969 | Nijisanji Japan |
Amamiya Kokoro/Kokoro Amamiya [Nijisanji affiliation] | 0.14043 | Nijisanji Japan |
Gweru male girl/Gwelu Os Gar [Nijisanji] | 0.14336 | Nijisanji Japan |
――As personally felt ――It is conspicuous that a triangle is formed by the cover of the viewer layer. .. .. I feel --Noefure
index | Mean | kind |
---|---|---|
Mel Channel Night sky Mel channel | 0.1314 | Hololive Japan |
SoraCh.Tokino Sora Channel | 0.1653 | Hololive Japan |
Nakiri Ayame Ch.Hyakuki Ayame | 0.1859 | Hololive Japan |
Kanata Ch.Amane Kanata | 0.2664 | Hololive Japan |
Watame Ch.For square winding | 0.2684 | Hololive Japan |
Shion Ch.Shisaki Zion | 0.2699 | Hololive Japan |
――As personally felt --Kaoru Tsukishita is good
index | Mean | kind |
---|---|---|
Izuru Ch.Player Izuru | 0.1544 | Holostars |
Kira Ch.Mirror Kira | 0.1688 | Holostars |
Rikka ch.Ritsumei | 0.1748 | Holostars |
astel ch.Astel | 0.2173 | Holostars |
Shien Ch.Kageyama Cien | 0.2178 | Holostars |
Temma Ch.Nobuo Kishi | 0.2222 | Holostars |
-Is the audience divided by Sugariri, Honeystrap, and AniMare?
index | Mean | kind |
---|---|---|
Patra Channel /Suo Patra [Honeystrap] | 0.1307 | 774 inc. |
Haneru Channel /Haneru Inaba [AniMare] | 0.1335 | 774 inc. |
CAMOMI Camomi Channel [Kamomi Camomi] | 0.1369 | 774 inc. |
Izumi Channel /Izumi Yuzuhara [AniMare] | 0.1931 | 774 inc. |
Anna Channel /Anna Torajo [Sugariri] | 0.1949 | 774 inc. |
Rene Channel /Ryugasaki Rin [Sugariri] | 0.2055 | 774 inc. |
――The line is thin and there is not much coverage of the viewer group --The line between Babiniku uncle is thick
index | Mean | kind |
---|---|---|
Engine Kazumi | 0.03281 | upd8 |
Yuuki Channel [Fucking sex education] | 0.03323 | upd8 |
Cheri High Homecoming Department | 0.03345 | upd8 |
Nora Cat Channel | 0.04661 | upd8 |
Tomari Mari channel /Tomari Mari Channel | 0.04728 | upd8 |
Tuna channel | 0.04752 | upd8 |
――Since the line disappears, draw the top 25% of the line only here --Mr. Yuzuru Himesaki and Mr. Takuma Kumagai haven't posted any videos yet.
index | Mean | kind |
---|---|---|
Norio Tsukudani [Tamaki Inuyama] | 0.2353 | Noripuro |
Aimiya Milk Milk Enomiya | 0.2453 | Noripuro |
Shirayuki Mishiro | 0.2591 | Noripuro |
Norio Tsukudani [Tamaki Inuyama] | 0.2353 | Noripuro |
Aimiya Milk Milk Enomiya | 0.2453 | Noripuro |
Shirayuki Mishiro | 0.2591 | Noripuro |
――I noticed after plotting, but Yui Yui and Shia Minase belong to the office. It is also obvious that the viewers overlap
index | Mean | kind |
---|---|---|
Kobana | 0.08867 | Other VTubers |
Kazenomiya Festival/ Matsuri Channel | 0.09249 | Other VTubers |
Heavenly Hiyo | 0.09265 | Other VTubers |
Makio [Individual] | 0.10765 | Other VTubers |
Sia Minase [Sia Channel] | 0.11933 | Other VTubers |
Musubime Yui 〖YouTube〗 | 0.12053 | Other VTubers |
――If you do all the combinations, there will be a lot of images, so only between Nijisanji and hololive. ――Is it the influence of the Ozora family that Subaru Ozora and Keisuke Maimoto are in the top of the weight of the connected edge?
Among the hololive channels, the bottom 3 of the average weight of the edges connected to Nijisanji
index | Mean | kind |
---|---|---|
Mel Channel Night sky Mel channel | 0.02613 | Hololive Japan |
SoraCh.Tokino Sora Channel | 0.03727 | Hololive Japan |
Towa Ch.Everlasting Towa | 0.03840 | Hololive Japan |
Kanata Ch.Amane Kanata | 0.05254 | Hololive Japan |
Marine Ch.Treasure bell marine | 0.05539 | Hololive Japan |
Subaru Ch.Ozora Subaru | 0.05971 | Hololive Japan |
Of the Nijisanji channels, the bottom 3 of the average weight of the edges connected to hololive
index | Mean | kind |
---|---|---|
Azuchi peach | 0.003585 | Nijisanji Japan |
Harusaki Air | 0.009090 | Nijisanji Japan |
Gilzaren III Season 2 | 0.009123 | Nijisanji Japan |
[3rd grade 0 group] Mirei Gunmichi's classroom | 0.085043 | Nijisanji Japan |
Keisuke Maimoto | 0.087824 | Nijisanji Japan |
Lulu Suzuhara [Nijisanji affiliation] | 0.096265 | Nijisanji Japan |
――If you improve the collaboration, the viewers will be overwhelmed, that's right. ――It seems interesting to do core extraction and cluster analysis.