1. 1. At the beginning

Qiita debuted with things lol. Previously, I tried to automatically generate a quiz QA using Natural Language Processing API / COTOHA provided by NTT Communications. Every time, a gift project with Mr. Qiita was held, so I'm really disappointed. That's why I will write about the automatic generation of QA quizzes using news articles.

2. Development environment

*Google Colaboratory *Python3

COTOHA api (parsing)

3. 3. Overview

What I want to do is, for example

A demonstration of the Yellow Vest movement protesting the Macron administration in France was held on the 18th for 27 consecutive weeks. According to the Ministry of Interior ...

If you can read information such as "who" and "where" well when there is an article like

Question: Where did the Yellow Vest movement protest against Macron's administration take place on the 18th and 27th consecutive week? Answer: France

The motivation is that you can automatically create question sentences and answers. Let COTOHA take charge of this "good" part.

Four. Sentence analysis

Let's throw the example sentence article from earlier into COTOHA's parsing.

curl -H "Content-Type:application/json;charset=UTF-8" -H "Authorization:Bearer **Own Token**" -X POST -d '{"sentence":"A demonstration of the Yellow Vest movement protesting the Macron administration in France was held on the 18th for 27 consecutive weeks."}' "https://api.ce-cotoha.com/api/dev/nlp/v1/parse"

The response looks like this. The morphological analysis was carried out firmly, and it was possible to extract that "France" is "[" unique "," earth "]".

{
  "result" : [ {
    "chunk_info" : {
      "id" : 0,
      "head" : 8,
      "dep" : "D",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 0,
      "form" : "France",
      "kana" : "France",
      "lemma" : "France",
      "pos" : "noun",
      "features" : [ "Unique", "Ground" ],
      "dependency_labels" : [ {
        "token_id" : 1,
        "label" : "case"
      } ],
      "attributes" : { }
    }, {
      "id" : 1,
      "form" : "so",
      "kana" : "De",
      "lemma" : "so",
      "pos" : "Case particles",
      "features" : [ "Continuous use" ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 1,
      "head" : 2,
      "dep" : "D",
      "chunk_head" : 1,
      "chunk_func" : 2,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 2,
      "form" : "Macron",
      "kana" : "Macron",
      "lemma" : "Macron",
      "pos" : "noun",
      "features" : [ ],
      "attributes" : { }
    }, {
      "id" : 3,
      "form" : "administration",
      "kana" : "Seiken",
      "lemma" : "administration",
      "pos" : "Noun suffix",
      "features" : [ "noun" ],
      "dependency_labels" : [ {
        "token_id" : 2,
        "label" : "compound"
      }, {
        "token_id" : 4,
        "label" : "case"
      } ],
      "attributes" : { }
    }, {
      "id" : 4,
      "form" : "To",
      "kana" : "D",
      "lemma" : "To",
      "pos" : "Case particles",
      "features" : [ "Continuous use" ],
      "attributes" : { }
    } ]
  }

On the other hand, it seems that it is difficult at present until Macron is a person's name. In addition, since the phrase information is taken in a block called "chunk_info" (great !!), if a place name etc. is included in a certain phrase, rewrite the original article so that the phrase (place name) is the answer. If so, it looks like a quiz.

5. Implementation

This time, I tried to input two patterns of place name and person.

import collections
import requests
import json

#Input statement
sentence = 'A demonstration of the Yellow Vest movement protesting the Macron administration in France was held on the 18th for 27 consecutive weeks.'
url = 'https://api.ce-cotoha.com/api/dev/nlp/v1/parse'
headers = {'Content-Type':'application/json;charset=UTF-8','Authorization':'Bearer **Token**'}
payload={'sentence':sentence}

#Request to COTOHA
r = requests.post(url, data=json.dumps(payload), headers=headers)

#Store parsing results
data = r.json()
j = json.dumps(data["result"])
chunk_dic = json.loads(j, object_pairs_hook=collections.OrderedDict)

quiz_flug = 100
chunk_dic_len = len(chunk_dic)
token = ""

#Judge whether a person's name or place name is included for each clause. If found, make it a quiz candidate
for i in range(chunk_dic_len):
    dic = chunk_dic[i]
    dic_len = len(dic["tokens"])
    for j in range(dic_len):
      if "Name" in dic["tokens"][j]["features"] and "Surname" in dic["tokens"][j]["features"] and "Unique" in dic["tokens"][j]["features"]:
        #Answer the quiz
        key_word = dic["tokens"][j]["form"]
        #Variables used to extract clauses from the original article
        for s in range(dic_len):
          token = token + dic["tokens"][s]["form"]
        quiz_flug = "0"

      elif "Ground" in dic["tokens"][j]["features"] and "Unique" in dic["tokens"][j]["features"]:
        key_word = dic["tokens"][j]["form"]
        for s in range(dic_len):
          token = token + dic["tokens"][s]["form"]
        quiz_flug = "1"
          
question_sentence = sentence[sentence.find(token):]
question_sentence = question_sentence.replace(token, '')

#Change the quiz sentence for each extracted noun
if quiz_flug == "0":
  question_sentence = question_sentence[:-1] + "Who is"
elif quiz_flug == "1":
  question_sentence = question_sentence[:-1] + "Where is it?"
  
print("problem:",question_sentence)
print("answer:",key_word)

Execution result

Question: Where did the Yellow Vest movement protest against Macron's administration take place on the 18th and 27th consecutive week?
Answer: France

For the time being, the target is done.

6. test

Experiment with various articles.

・ Original article ①
###
#Former Morning Musume. In Talent Private, "Mom of 4 children" Nozomi Tsuji announced on the 17th that she will open "Tsuji-chan Nell" and make her debut on YouTube."
###
Question: Who announced that "Tsuji-chan Nell" will be opened and debuted on YouTube on the 17th?
Answer: Nozomi Tsuji
・ Original article ②
###
#President Donald Trump was not informed in advance that he had returned 14 American passengers on a cruise ship who had been confirmed infected with the new coronavirus on a charter plane, and a leading newspaper reported that he was angry. It was.
###
Question: Where did the leading newspaper report that President Trump was angry without being informed in advance of returning 14 people on a charter plane?
Answer: America
・ Original article ③
###
#Regarding Gifu Castle, which is known as a mountain castle that Nobunaga Oda captured during the Warring States period, Gifu City announced on the 7th that it discovered the stone wall of the castle tower (the base of the castle tower) that Nobunaga seems to have built during the excavation survey.
###
Question: On the 7th, where did you announce that you first discovered the stone wall of the castle tower (the base of the castle tower) that Nobunaga seems to have built in the excavation survey?
Answer: Gifu City

It's hard to say that it's a perfect output, but it's a result that feels possible! I would like to improve the accuracy while scrutinizing the rules a little more! !!

7. Quote

French demonstration half a year, scale down (Kyodo News) Nozomi Tsuji announces her debut on YouTube "Using my own experience" to deliver information and beauty information for moms (Oricon News) Infected person returns home, Mr. Trump gets angry without prior report (Nippon News Network (NNN)) Characteristics built by Nobunaga, the first stone wall of the castle tower confirmed at Gifu Castle (Asahi Shimbun)

Automatic quiz generation with COTOHA