It is an automatic summarization API of sentences published by Recruit Technologies. It summarizes the entered text with the specified number of lines.
Published GitHub https://github.com/recruit-tech/summpy
This time, I put various sentences into this API and I tried to verify what the result would be.
EC2(Amazon Linux release 2) python2.7
Install pip, summpy, mecab-python3
For mecab-python3, if you do not specify version 0.996.5, Since the error "no such file or directory: / usr / local / etc / mecabrc" is displayed, the version is specified.
$ sudo easy_install pip
$ sudo pip install summpy
$ sudo pip install mecab-python3==0.996.5
At the same time, set the networkx version to 1.11. If you do not do this, you will get an "error": "add_edge () takes exactly 3 arguments (4 given)" "error at runtime.
$ sudo pip install multiqc==1.2
$ sudo pip install networkx==1.11
Starts on port 8080. Nohup is added to run it in the background.
nohup python -m summpy.server -h 127.0.0.1 -p 8080 &
summpy_test.py
#!/usr/bin/env python2
# coding:utf-8
import requests
limit = 3 #Here, specify the number of lines you want to summarize
text = 'Enter the text you want to summarize here.'
p = {'sent_limit':limit, 'text':text}
r = requests.get('http://localhost:8080/summarize', params=p)
print(r.text)
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"Enter the text you want to summarize here."
]
}
Since the text is one line, the result is also one line. I would like to change the text here in various ways.
From here, I will summarize various sentences according to each theme. The text used for the abstract uses the content of the following article.
How do you interpret "Adler Psychology" from an engineer's perspective? https://qiita.com/keki/items/0542d9d121cf89d6154e
First, let's summarize the following sentences. In addition, after deleting the line breaks, I put it in the summary API.
At this age, people become more interested in feelings, feelings, and ways of thinking.
Meanwhile, I came across "Adler Psychology" in this title a few years ago.
Engineers sometimes concentrate on the specialized work of programming, and it is often said that they are not good at communicating with people and that they are not good at joining the circle of teams.
Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.
I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.
This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.
By the way, it is quite a long sentence.
I hope you will see it with the intention of reading a small book.
summpy_test.py
#!/usr/bin/env python2
# coding:utf-8
import requests
limit = 3 #Here, specify the number of lines you want to summarize
text = 'At this age, people become more interested in feelings, feelings, and ways of thinking. Meanwhile, I came across "Adler Psychology" in this title a few years ago. Engineers are programming
Sometimes it is said that I am not good at communicating with people and I am not good at joining the team because I concentrate on my specialized work. Also, I'm worried about relationships and suffering from depression....Jobs that are often>I think it's a seed. I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships. This time, I am an engineer in the field of such "Adler psychology", but from the engineer's perspective
I would like to write an article about what it means when interpreted in. By the way, it is quite a long sentence. I hope you will see it with the intention of reading a small book.'
p = {'sent_limit':limit, 'text':text}
r = requests.get('http://localhost:8080/summarize', params=p)
print(r.text)
$ python summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
]
}
Hmm. The connection between the 1st and 2nd lines is difficult to understand, but it is summarized in 3 lines. Also, it seems that the original text is not processed, but the text is simply extracted and selected.
In order to investigate what punctuation means in summpy, I dare to remove all punctuation.
At this age, I'm interested in people's feelings and emotional thinking.
A few years ago, I met "Adler Psychology," which is also in this title.
Engineers are not good at communicating with people because they concentrate on the specialized work of programming, and I think it is sometimes said that they are not good at joining the circle of teams.
Also, I'm worried about relationships and suffering from depression....I think that there are many cases of
I personally feel that "Adler Psychology" is the idea itself for solving such problems of human relationships.
This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, who is actually an engineer in the field.
By the way, it will be quite a long sentence
I hope you will see it with the intention of reading a little book.
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, I'm interested in people's feelings and emotional thinking. I met a few years ago in "Adler Psychology," which is also in this title. Engineers specialize in programming. I'm not good at communicating with people because I concentrate on my work. I think it's sometimes said that I'm not good at joining the circle of teams. Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this, and I feel that "Adler Psychology" is the idea itself to solve such problems of human relationships. I would like to write an article about what it would be like if I interpret it from an engineer's point of view. By the way, it will be quite a long sentence. I hope you will read a little book."
]
}
It has been summarized in one line. Apparently, they consider punctuation as a sentence break.
Then what if we increase the number of lines to summarize? I tried to summarize the above sentence in 100 lines.
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
"Engineers sometimes concentrate on the specialized work of programming, and it is often said that they are not good at communicating with people and that they are not good at joining the circle of teams.",
"Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
"By the way, it is quite a long sentence.",
"I hope you will see it with the intention of reading a small book."
]
}
The original text is as it is. The comma (,) is not used as a sentence break, You can see that they are separated by kuten (.). Besides, it seems to be separated by dots (.), Question marks (?), And exclamation marks (!).
Now, let's gradually reduce the number of lines.
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
"Engineers sometimes concentrate on the specialized work of programming, and it is often said that they are not good at communicating with people and that they are not good at joining the circle of teams.",
"Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
"By the way, it is quite a long sentence."
]
}
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
"Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
"By the way, it is quite a long sentence."
]
}
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
"By the way, it is quite a long sentence."
]
}
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
"By the way, it is quite a long sentence."
]
}
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"At this age, people become more interested in feelings, feelings, and ways of thinking.",
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
]
}
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
]
}
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
]
}
Gradually, sentences that are judged to be insignificant are being deleted. I don't know what the criteria are, but After all, as a behavior,
There seems to be no mistake in the form.
Looking at the summary results, I personally think that the three-line summary is the easiest and most relevant. However, as the amount of text increases, I feel that I don't know what the story is with just three lines. I also feel that it is necessary to find an appropriate number of lines setting according to the amount of text.
To verify what would happen if you summarized the unconnected sentences Let's summarize the "table of contents" of the above article.
Reference book
Premise
1.People can change
1-1.There is no trauma
1-2.Don't be afraid to get hurt
1-3.Harm that occurs when the feeling of inferiority becomes too strong
1-4.Accept self
2.Separation of issues
2-1.You don't have to meet the expectations of others
2-2.Don't step into the challenges of others
2-3.Separation of issues
3.How to interact with others
3-1.Do not compete with others
3-2.Admit non-defeat = not lose
4.About raising people
4-1 Don't be scolded, don't praise
4-2 Thank you, not evaluate
5.Community sense
Finally
$ python ./summpy_test.py
{
"debug_info": {},
"summary": [
"2-1.You don't have to meet the expectations of others.",
"2-2.Don't step into the challenges of others.",
"3.How to relate to others."
]
}
Originally, it is a sentence that does not have much context, so it is natural that the summary result is also disorganized, It is interesting that not only the major categories were selected, but the middle categories (2-1 and 2-2) were selected.
I was wondering what kind of logic was summarized, so I took a quick look at the source code of the API published on GitHub.
Perhaps the part that summarizes (the part that corresponds to the core logic) is as follows, https://github.com/recruit-tech/summpy/blob/master/summpy/lexrank.py
I'm using DictVectorizer and pairwise_distances, so After separating sentences, feature extraction is performed, and the distance matrix of the feature is obtained. It looks like you're scoring the result ...
--The text is not processed on the summpy side. To the last, separate the original text with punctuation marks, etc., and extract the sentences with high importance ――As long as the amount of text and the number of lines are balanced, it will be summarized properly (I don't know what you're saying ...) ――The summary of disorganized sentences is NG. For example, if there is bulleted information such as "table of contents" in the text, it seems that it is better to remove it.
Thank you for watching till the end.
"What happens if you summarize this sentence?" "I want you to summarize this sentence!"
If you have a request such as, I would be grateful if you could comment.
Recommended Posts