TL;DR It is a proposal and experiment that it may be possible to extract "emotional" and "highlight" with better accuracy than just the number of comments per hour by performing sentiment analysis of Youtube live comments. ..
In fact, I was able to get the result that the accuracy seems to be higher than just looking at the number of comments.
Last time, I borrowed the COTOHA API and tried to think freely ~~ not practical ~~ article.
This time, I tried to make it a little more practical and applicable. Since the background is shared at the beginning, there is no problem even if you skip it.
As you often hear, the e-sports market has been developing in recent years. Live distribution via Youtube and other media is inseparable from this e-sports market.
Unlike traditional game live video posting, Live distribution is similar to e-sports because it has strengths such as ** less editing burden, communication, "throwing money" culture, and the ability to share experiences in real time as a sport. I think it is expanding. However, there are some problems with this live distribution.
However, there are some problems with live distribution, which has such strengths. Among them, the biggest ones are the increased burden on viewers and the existence of "thieves" who use them.
On the contrary, live distribution, which has less burden on the distribution side, has the side of imposing the trouble of searching for the highlights on the viewer side. There is a problem that some people who have noticed it cut out the distribution of famous distributors and earn profits.
↓ This article has a different subject, but it is easy to read because it is lightly written about the problems around here. https://gigazine.net/news/20191216-youtuber-hired-youtuber-leech-off/
In the first place, the distribution itself is profitable by using the copyrighted works of the game company, so I think there are various difficult points, but I think that it is a problem when considering the Live distribution market.
I think one solution to this problem is for the official (person) to quickly make a cutout and upload it. However, this is because it has not been done.
So I think this gives you some motivation to automate this cropping process.
Various things have been proposed for automatic highlighting, but those that learn the video itself need a dataset, and a reasonable percentage is targeted at videos that show the real world. , I think it will take some ingenuity to bring it directly into the context of esports.
Therefore, this time, I focused on the comments being delivered live.
https://qiita.com/okamoto950712/items/0d4736c7be251532a03f
There was an article like this in the precedent. This person extracted the highlights from the comments using the word "grass" as a keyword. However, as this person said, the criteria remain questionable. So this time I'm going to update this criterion with some help from natural language processing.
When you think about what the highlight of live streaming is, you can think of it as "a place where many people's emotions are moved." So, I think the simplest example of what to expect in such a scene is the number of comments.
It is intuitively clear that a large number of comments such as "Wow" flow at the highlight. So, first of all, I will analyze the highlights based on the number of comments. (But, as I'll explain later, this is not enough)
This time, I would like to borrow EGS's smartphone! # 48. I chose it because it was e-sports (smash bros) and the comment stream was turned on, not a selection that was particularly thoughtful.
This time, I will select comments that do not understand the video content as much as possible, so if you are interested in the content, please go to see the original video.
I referred to here for getting comments. [Get chats (comments) from YouTube Live archives in Python (revised edition)] (http://watagassy.hatenablog.com/entry/2018/10/08/132939)
I was able to get it very easily. Thank you very much. We will convert the time obtained from here into seconds and process it appropriately.
Since the video time was about 10,000 seconds, I will write a histogram with 500 divisions (about 20 seconds per section).
This peak is where the number of comments is increasing rapidly, so I think it can be said to be a highlight.
I would like you to pay attention to this part. In this part, something like a peak appears around 5660 seconds, but this was because the viewer who questioned how the delivery was cut off wrote a comment.
(There may be an idea that this small peak should be overlooked from the beginning, but in general, the number of comments increases as the tournament live distribution approaches the end, so it is difficult to judge whether this peak is relatively small It is not always possible.) (Time series data is difficult)
To get rid of these parts, consider using the emotional value of the comment.
I've finally reached this point. COTOHA is an API that allows NTT to easily process natural language.
If you register as a free user from this COTOHA API PORTAL, even a free user can use the API for 1000 / day. I will.
COTOHA registration-token issuance is explained in the previous article, and others / youwht / items / 16e67f4ada666e679875) Article, so please have a look there.
This time we will use the sentiment analysis of the COTOHA API.
def get_senti(txt):
header = {
"Content-Type":"application/json",
"Authorization":"Bearer "+[token]
}
datas = {
"sentence":txt,
}
r = requests.post(api_base+"nlp/v1/sentiment",headers=header,data=json.dumps(datas))
parsed = json.loads(r.text)
return parsed
Using a function like this, we will loop through sentiment analysis for comments.
The code is a little confusing because it is a part of the whole, but kouho contains the comments to be analyzed, separated by section. In this case, kouho [0] contains the comments from 5600 to 5800 seconds in the form of an array of [comments, seconds].
sentis contains the emotion analysis results corresponding to the comments in the form of [Pos or Neg, sentiment score]. What I'm doing as a code is ** calculating how many comments with positive or negative emotions are included in the comments in the last 20 seconds based on a certain point. ** **
Negative is also counted because of its nature, for example, when the supporter loses or when the word "Yabai" is used in a good way.
buf = [0 for i in range(200)]
doubled = [0 for i in range(200)]
target = 0
for i in range(len(sentis[0])):
if sentis[target][i][0] == "Positive" or sentis[target][i][0] == "Negative":
base = 1
else:
base = 0.0001
buf[kouho[target][i][1]-5600] += base
doubled[kouho[target][i][1]-5600] += 1
for i in range(200):
if doubled[i] != 0:
buf[i] = buf[i]/doubled[i]
buf = pd.Series(buf)
rolled = [0 for i in range(200)]
for i in range(19,200):
temp = 0
temp_c = 0
for j in range(20):
temp += buf[i-j]
if buf[i-j] != 0:
temp_c += 1
if temp_c == 0:
rolled[i] = 0
else:
rolled[i] = temp/temp_c
Comment example
['Neutral', 0.5976078134278074] ['What? '56 46] ['Neutral', 0.2972885953707938] ['Cut' 5646] ['Neutral', 0.5971540523081994] ['Otsu? '56 48]
['Positive', 0.5905005661402767] ['Good-looking guy' 5705] ['Positive', 0.6376353638242411] ['Cool' 5710]
['Positive', 0.5408603768859751] ['I'm good at using cancer' 5766]
['Neutral', 0.30682113884940243] ['Umee' 5795] ['Positive', 0.2976389932477489] ['Understand beautiful' 5797] ['Neutral', 0.3140117003315527] ['Uma' 5799]
(Since we are borrowing videos, we have selected the ones that are not related to the content as much as possible, so if you are interested in all the comments, please watch the video head family.)
During this time, ** the number of comments peaked around 5660, but it wasn't the true highlight, but rather around 5800 was found to be closer. ** **
I would like to analyze another scene as well. The target was around 9500-9700 seconds
It seems that the championship has been decided. I will also analyze the emotions of the comments in this scene and plot them.
The result is shown in the graph above. Apparently around 9550 seconds is the most emotional moment. Let's check the comments.
['Neutral', 0.3140117003315527] ['Yeah yeah yeah yeah yeah yeah yeah '95 54] ['Positive', 0.5873145362524327] ['Usugi' 9554] ['Neutral', 0.5972752547353822] ['Upper! ?? '95 5] ['Neutral', 0.2932173519717803] ['Oh ~ w'9555] ['Neutral', 0.30920978004290844] ['Usungi' 9555] ['Neutral', 0.31430016829530116] ['Uoooooooo' 9555] ['Positive', 0.6049280950215463] ['Perfect' 9555] ['Neutral', 0.31322001724349124] ['Yabe yeah' 9555] ['Negative', 0.7934200331057456] ['Negative' 9555]
It seems that something actually happened. It's emo.
By using the COTOHA sentiment analysis API, it was found that the accuracy of comment-based Youtube Live highlight judgment could be improved.
I thought it would be interesting to see the emotional value of the viewer in real time. This time, we targeted videos that have a lot of comments, but if you transcribe what the person said and perform sentiment analysis on it, for example, for individual commentators with few comments, "the emotions of the commentator" I think that the moving "→" "something interesting happened" holds true to some extent, so I thought it might be possible to cut out the highlights from such an approach.