[JAVA] I made a butler who summarizes tweets [Maximum covering model]

Introduction

This time, I made a tweet summarization API using the maximum coverage model. It is an API like a butler that summarizes and tells you the latest topic when you specify a keyword. For example, you can use it for purposes such as "I want to know the market voice of the product I'm interested in", "I want to know the news and reputation of my company", "I want to know the FGO related news tweets that are popular now".

API released

-Tweet Summary Concierge -Click here for the procedure and explanation before using -Java sample code is here

public class Api27TwitterSummarizeExample {

  static String ENDPOINT     = "https://api.apitore.com/api/27/twitter-summarize/get";
  static String ACCESS_TOKEN = "YOUR-ACCESS-TOKEN";

  public static void main(String[] args) {
    RestTemplate restTemplate = new RestTemplate();
    Map<String, String> params = new HashMap<String, String>();
    params.put("access_token", ACCESS_TOKEN);
    params.put("q", "Apitore");
    params.put("iter", "1");
    params.put("num", "3");
    String url = UrlFormatter.format(ENDPOINT, params);

    TwitterSummarizeResponseEntity response =
        restTemplate.getForObject(url, TwitterSummarizeResponseEntity.class, params);

    System.out.println(response.getLog());
    System.out.println(response.getTweets().get(0).getText());
    System.out.println(response.getTweets().get(0).getScore());
  }

}

What is a document summary?

Wikipedia's Automatic Summarization entry is helpful. I will. This time, we took the approach of "selecting any number of important sentences (tweets) in the maximum coverage model". A very rough explanation of the sentence-selective document summarization by the maximum cover model is "select N sentences containing many important words so that the total character length does not exceed the length M". It is usually calculated by the greedy method with guaranteed performance. The importance of a word may be calculated using something like TF-IDF, or you may create your own importance dictionary and refer to it.

algorithm

I will omit the details, but the general flow is as follows.

  1. Get T tweets containing any keyword with Easy Tweet Collection API
  2. Morphological analysis with kuromoji-ipadic-neologd, targeting only content words (nouns, verbs, adjectives, adjective verbs)
  3. Calculate the content word bigram, and set the Document Frequency of the content word bigram as the importance
  4. Extract important sentences by maximum covering model + greedy algorithm with guaranteed performance

Let's summarize

I tried to summarize using 100 tweets containing arbitrary keywords. First, try entering "A Certain Magical Index". Apparently, a collaboration project is being done with a social network game called Divine Gate. It seems that Famitsu also has a special feature.

  "tweets": [
    {
      "statusId": 819472977976660000,
      "text": "The performance of a certain magical index collaboration unit has been adjusted! !!\n Click here for details\nhttps://t.co/nCtI6L2lEB\n#Divine https://t.co/euWj6SVOlU",
      "createdAt": 1484212557000,
      "userId": 1651752007,
      "userName": "Mr. Divine@Divine Official",
      "userScreenName": "divine_gate",
      "userProfileImageURL": "http://pbs.twimg.com/profile_images/545168403653083137/tPjlbZBx_normal.png ",
      "score": 15.556059143488126
    },
    {
      "statusId": 819379554334576600,
      "text": "[Divine Gate Laboratory] Theatrical version "A Certain Magical Index" collaboration is 1/Reprinted from 13! The status and skills of the new unit are also revealed-Famitsu App https://t.co/i8B6GkYabA https://t.co/mmNxqtB0RR",
      "createdAt": 1484190283000,
      "userId": 5921162,
      "userName": "Famitsu.com",
      "userScreenName": "famitsu",
      "userProfileImageURL": "http://pbs.twimg.com/profile_images/378800000675754960/d98b55583c072a904463b4a625655c70_normal.png ",
      "score": 13.692591672735654
    },

Then "Gacha". FGO gacha seems to be hot right now!

  "tweets": [
    {
      "statusId": 819441067925782500,
      "text": "I think FGO is really conscientious in the sense that "Gacha is self-responsibility" because there is no silent pressure from the system that "I can not clear without this or I can not beat other players"",
      "createdAt": 1484204949000,
      "userId": 113922292,
      "userName": "Pumpkin@Tiamat mackerel",
      "userScreenName": "Lantern_pumpkin",
      "userProfileImageURL": "http://pbs.twimg.com/profile_images/787644896349872128/67tAY3Cp_normal.jpg ",
      "score": 21.51741612946915
    },
    {
      "statusId": 819164551534870500,
      "text": "For those who are worried about spinning gacha, I'll make it up with Just do it lie subtitles and leave it there https://t.co/YrKBV4CmQp",
      "createdAt": 1484139023000,
      "userId": 3002869106,
      "userName": "Naramaru(Nekamaru)",
      "userScreenName": "NakedAmarl",
      "userProfileImageURL": "http://pbs.twimg.com/profile_images/732055255278452736/gFzwi9NU_normal.jpg ",
      "score": 12.289830977543446
    },

At the end, "Prime Minister Abe" expecting something serious. Renho's remarks and 1 trillion yen support in the Philippines seem to be a hot topic. I didn't know that I would support the Philippines for 1 trillion yen. Hmmmm.

  "tweets": [
    {
      "statusId": 819510275959181300,
      "text": "Democratic Party Renho "I haven't been able to share consciousness with Prime Minister Shinzo Abe" -The reaction of the net "Mr. Renho who can't share consciousness with voters talks a lot"\nhttps://t.co/oBChp3nIzL",
      "createdAt": 1484221450000,
      "userId": 3147651630,
      "userName": "Anonymous Post",
      "userScreenName": "anonymous201504",
      "userProfileImageURL": "http://pbs.twimg.com/profile_images/713181990690775040/BWrea93U_normal.jpg ",
      "score": 26.90972898141721
    },
    {
      "statusId": 819496986638315500,
      "text": "TBS ◆ Prime Minister Abe announced support for 1 trillion yen at the Japan-Philippines summit meeting https://t.co/eabtk7unzH "For the further development of the Philippines, we will combine ODA and private investment to create businesses and opportunities on the scale of 1 trillion yen over the next five years" (Prime Minister Abe) * Starting this year as well.",
      "createdAt": 1484218281000,
      "userId": 57184966,
      "userName": "deepthroat",
      "userScreenName": "gloomynews",
      "userProfileImageURL": "http://pbs.twimg.com/profile_images/315728613/deepthroat_normal.gif",
      "score": 23.325985203733698
    },

in conclusion

I searched for a tweet summarization service but couldn't find it. I think it's quite convenient. Publish API, so please use it. I think that it will be useful for applications such as "when a new product comes out", "when a new character comes out with a social network game", "I want to know the reputation of my company", "I want to know current affairs", "I want to know the recommendation of a humidifier" I will.

Recommended Posts

I made a butler who summarizes tweets [Maximum covering model]
I made a chat app.
Ruby: I made a FizzBuzz program!
I made a shopify app @java
I made a GUI with Swing
I made a simple recommendation function.
I made a matching app (Android app)
I made a package.xml generation tool.
[Android] I made a pedometer app.
I made a reply function for Rails Tutorial extension (Part 2): Change model
I made a site that summarizes information on carbohydrate restriction with Vue.js
[Ruby] I made a simple Ping client
I made a risky die with Ruby
I made a plugin for IntelliJ IDEA
I made a rock-paper-scissors app with kotlin
I made a calculator app on Android
I made a new Java deployment tool
I made a rock-paper-scissors app with android
I made a bulletin board using Docker 1