I tried to pack the following software that is useful for Japanese natural language processing into one Docker image.
For the time being, if you want to set up a container and go inside it, you can do it like this:
$ docker run --rm -it ototadana/nlp-jp bash
Starting Python looks like this:
$ docker run --rm -it ototadana/nlp-jp python
MeCab An example of executing the mecab command:
$ echo "The UFO crash that became a hot topic last year is now just a tourism resource. City specialties" | docker run --rm -i ototadana/nlp-jp mecab
Last year noun,Adverbs possible,*,*,*,*,last year,Sakunen,Sakunen
Topic noun,General,*,*,*,*,topic,Wadai,Wadai
Particles,Case particles,General,*,*,*,To,D,D
Verb,Independence,*,*,Five steps, La line,Continuous connection,Become,Nat,Nat
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
UFO noun,Proper noun,General,*,*,*,UFO,UFO,UFO
Crash noun,Change connection,*,*,*,*,Crash,Twirac,Twirac
Case noun,General,*,*,*,*,Incident,Jiken,Jiken
, Symbol,Comma,*,*,*,*,、,、,、
Now noun,Adverbs possible,*,*,*,*,now,Ima,Ima
Is a particle,Particle,*,*,*,*,Is,C,Wow
Just a noun,General,*,*,*,*,However,free,free
Particles,Attributive,*,*,*,*,of,No,No
Tourism resource noun,Proper noun,General,*,*,*,Tourism resources,Kankoushigen,Kanko Shigen
.. symbol,Kuten,*,*,*,*,。,。,。
City noun,General,*,*,*,*,City,Machi,Machi
Particles,Attributive,*,*,*,*,of,No,No
Famous noun,General,*,*,*,*,Specialty,Mabutsu,Mabutsu
EOS
CaboCha Execution example of cabocha command:
$ echo "The UFO crash that became a hot topic last year is now just a tourism resource. City specialties" | docker run --rm -i ototadana/nlp-jp cabocha
last year---D
To the topic-D
became-D
UFO crash,-----D
now---D
Just-D
Tourism resources.---D
the town's-D
Specialty
It is stored as a database in sqlite format in Japanese WordNet. You can access it with python code like below:
example-wordnet.py:
import sqlite3
query = """
select c.def from sense a, word b, synset_def c
where b.lemma = ? and c.lang = 'jpn'
and a.wordid = b.wordid and a.synset = c.synset
"""
with sqlite3.connect('/dictionary/wnjpn.db') as conn:
print([row[0] for row in conn.cursor().execute(query, ['topic'])])
If you write this code on the host side, mount the current directory on the host side with the -v
option and execute it as follows:
$ docker run --rm -i -v $PWD:/app ototadana/nlp-jp python /app/example-wordnet.py
['Subject of conversation or discussion']
MeCab can also be accessed from Python code. From Python code, you can use MeCab in combination with Japanese WordNet as follows:
example-mecab+wordnet.py:
import MeCab, sqlite3
def get_definition(word):
query = """
select c.def from sense a, word b, synset_def c
where b.lemma = ? and c.lang = 'jpn'
and a.wordid = b.wordid and a.synset = c.synset
"""
with sqlite3.connect('/dictionary/wnjpn.db') as conn:
return [row[0] for row in conn.cursor().execute(query, [word])]
tagger = MeCab.Tagger()
tagger.parse('')
node = tagger.parseToNode('The UFO crash that became a hot topic last year is now just a tourism resource. City specialties').next
while node:
print('%s:' % node.surface)
print(' - %s' % node.feature)
for definition in get_definition(node.feature.split(',')[6]):
print(' - %s' % definition)
print()
node = node.next
When I do this, it looks like this:
$ docker run --rm -i -v $PWD:/app ototadana/nlp-jp python /app/example-mecab+wordnet.py
last year:
-noun,Adverbs possible,*,*,*,*,last year,Sakunen,Sakunen
topic:
-noun,General,*,*,*,*,topic,Wadai,Wadai
-Subject of conversation or discussion
To:
-Particle,Case particles,General,*,*,*,To,D,D
Became:
-verb,Independence,*,*,Five steps, La line,Continuous connection,Become,Nat,Nat
-Accept change or development
-Loud and cheerful
-Get sick and be a victim of illness
-Officially take a year
-Appropriate
-To make or represent:
-To exist
-Number or quantity calculation fits
-Develop and reach maturity
-To mature
-Happens in a particular way
-Reach or enter a state, relationship, condition, use or status
-Can, change, be made, or they are possible
-Gradually shifts to a state and exhibits a particular property or attribute
-Become
-To be in or to be in a particular state or state
-Deformed or subject to changes in position or behavior
-Direct or distract a person's attention, interests, thoughts, or interests from something
-Develop
Ta:
-Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
UFO:
-noun,Proper noun,General,*,*,*,UFO,UFO,UFO
Crash:
-noun,Change connection,*,*,*,*,Crash,Twirac,Twirac
-Rapid free fall due to gravity
-When under the influence of gravity, it falls without stopping
-Fall or fall sharply
Incident:
-noun,General,*,*,*,*,Incident,Jiken,Jiken
-Public uproar
-Issues that need to be investigated
-A single notable event
-Something happened
、:
-symbol,Comma,*,*,*,*,、,、,、
now:
-noun,Adverbs possible,*,*,*,*,now,Ima,Ima
-Current or modern
-Momentary present
-Time currently happening
-A series of hours including the moment of speech
-Just a little bit before
-Historical present
-At this point in the narration of a series of past events
-In the current time, the time pattern
-Just now
-At the moment
-Current
-At the moment
Is:
-Particle,Particle,*,*,*,*,Is,C,Wow
However:
-noun,General,*,*,*,*,However,free,free
-Without anything else included or related
-And many are nothing
of:
-Particle,Attributive,*,*,*,*,of,No,No
Tourism resources:
-noun,Proper noun,General,*,*,*,Tourism resources,Kankoushigen,Kanko Shigen
。:
-symbol,Kuten,*,*,*,*,。,。,。
City:
-noun,General,*,*,*,*,City,Machi,Machi
-Situations that give opportunities
-A region of the town with distinctive characteristics
of:
-Particle,Attributive,*,*,*,*,of,No,No
Specialty:
-noun,General,*,*,*,*,Specialty,Mabutsu,Mabutsu
-Entertainment offered to the masses
:
- BOS/EOS,*,*,*,*,*,*,*,*
The sample sentence used in the above example is the beginning of the lyrics of Oedo Controller --Yunomi feat. TORIENA.
** Yunomi is the best! ** (In short, this is the entry I just wanted to say ...)
Recommended Posts