Use polyglot (Document).
The following has been confirmed to work with Python 3.8.5. First,
pip install numpy
pip install polyglot
pip install six
pip install pycld2
pip install morfessor
pip install pyicu
Install in the order of.
However, when ModuleNotFoundError tells you to put icu
,
pip install icu
not
pip install pyicu
Let. If you try to install and use icu
, you should get the error cannot import name xxx
. Note that it is a different item.
If that doesn't work, see Error installing pip pyicu.
Look at the official Part of Speech Tagging and look up the part of speech.
from polyglot.text import Text
blob = "You never fail until you stop trying."
tokens = Text(blob)
print(tokens.pos_tags)
This should give you the part of speech of every word in the sentence, but you should get an error.
ValueError: This resource is available in the index but not downloaded, yet. Try to run
polyglot download embeddings2.en
so
git clone https://github.com/web64/nlpserver.git
After that, on the 14th line of nlpserver.py
app.config['JSON_AS_ASCII'] = False
After adding
polyglot download embeddings2.en
polyglot download pos2.en
Is inserted. This part was written in Not able to pull polyglot files.
Now that you can analyze English, the previous code works,
from polyglot.text import Text
blob = "You never fail until you stop trying."
tokens = Text(blob)
print(tokens.pos_tags)
As a result of
[('You', 'PRON'), ('never', 'ADV'), ('fail', 'VERB'), ('until', 'SCONJ'), ('you', 'PRON'), ('stop', 'VERB'), ('trying', 'VERB'), ('.', 'PUNCT')]
Is obtained.
The result is hard to see in one line, so use pprint
on the last line
import pprint
pprint.pprint(tokens.pos_tags)
By
[('You', 'PRON'),
('never', 'ADV'),
('fail', 'VERB'),
('until', 'SCONJ'),
('you', 'PRON'),
('stop', 'VERB'),
('trying', 'VERB'),
('.', 'PUNCT')]
You may devise such as. The names of the part of speech are as follows. The abbreviation and description (English) are taken from Part of Speech Tagging.
Abbreviated name | Explanation(English) | Explanation(Japanese) |
---|---|---|
ADJ | adjective | adjective |
ADP | adposition | Preposition |
ADV | adverb | adverb |
AUX | auxiliary verb | Auxiliary verb |
CONJ | coordinating conjunction | Coordinate conjunction |
DET | determiner | Determiner |
INTJ | interjection | interjection |
NOUN | noun | noun |
NUM | numeral | numeral |
PART | particle | Particles |
PRON | pronoun | Pronoun |
PROPN | proper noun | Proper noun |
PUNCT | punctuation | Punctuation |
SCONJ | subordinating conjunction | Subordinate connection |
SYM | symbol | symbol |
VERB | verb | verb |
X | other | others |
Installation reference https://qiita.com/sawada/items/528da0b22546045122b2
Reference about the features of polyglot http://lab.astamuse.co.jp/entry/try-polyglot