I haven't found a lot of easy ways to get synonyms in Japanese when doing natural language processing with python, so I will summarize it.
This time, we will use NLTK's wordnet and the Open Multilingual Wordnet function for handling Japanese.
pip install nltk
python -c "import nltk;nltk.download('wordnet')"
python -c "import nltk;nltk.download('omw)"
Synset is a unit of concept defined in Wordnet. Let's try to get a Synset for the word "rice" and see the definition.
from nltk.corpus import wordnet
synsets = wordnet.synsets("Rice",lang='jpn')
for syn in synsets:
print(syn,":",syn.definition())
# Synset('rice.n.01') : grains used as food either unpolished or more often polished
# Synset('united_states.n.01') : North American republic containing 50 states - 48 conterminous states in North America plus Alaska in northwest North America and the Hawaiian Islands in the Pacific Ocean; achieved independence in 1776
# Synset('meter.n.01') : the basic unit of length adopted under the Systeme International d'Unites (approximately 1.094 yards)
It can be confirmed that "food", "America", and "meter" are registered as the concept for "rice".
Since words belonging to the concept are registered in Synset, they can be obtained as synonyms. Try to get a synonym for "rice" as "food"
rice_synset=synsets[0]
synonyms=rice_synset.lemma_names("jpn")
print(synonyms)
# ['Rice', 'rice', 'I'm sorry', 'U.S.A.', 'Raised rice', 'rice offered to a god', 'Yagi', 'rice', 'Pillow rice', 'Rice production', 'Rice field', 'White rice', 'God rice', 'Valley', 'Rice', 'Rice孫', 'Grain', 'Rice', 'RiceGrain', 'Ricefood', '粮Rice', '糧Rice', 'Sari', '褻Rice', 'Silver rice', 'rice', 'food', 'foodRice']
I was able to acquire good synonyms such as "rice" and "rice".
I was able to easily search for synonyms from python using NLTK's Open Multilingual Wordnet. As a caveat, multiple concepts are registered for some words, so it seems necessary to choose an appropriate Synset so as not to get synonyms that are different from what you intended.
that's all
Recommended Posts