https://cloud.google.com/sdk/docs/quickstart-mac-os-x?hl=ja
https://cloud.google.com/natural-language/docs/common/auth?hl=ja#set_up_a_service_account
export GOOGLE_APPLICATION_CREDENTIALS="/Users/users/hoge/key.json"
https://cloud.google.com/natural-language/docs/getting-started?hl=ja
https://github.com/GoogleCloudPlatform/google-cloud-python python3.6 is also supported
pip install --upgrade google-cloud
gcloud auth application-default login
https://googlecloudplatform.github.io/google-cloud-python/stable/language-responses.html#google.cloud.language.entity.Entity
https://googlecloudplatform.github.io/google-cloud-python/stable/language-usage.html Create a convenient class as follows, referring to:
from google.cloud import language
class GCNaturalLanguage(object):
def __init__(self, upper=10000):
# Instantiates a client
self.client = language.Client()
self.upper = upper
def get_entity(self, text):
length = len(text)
if length > self.upper:
print("{} .. too long".format(length))
return {}
document = self.client.document_from_text(text, language='ja')
# Detects the sentiment of the text
res = document.analyze_entities()
print("{} characters => done!".format(len(text)))
dic = {}
for entity in res.entities:
for m in entity.mentions:
dic.update({m.text.begin_offset: m.text.content})
return dic
Example)
import GCNaturalLanguage
gcn = GCNaturalLanguage()
dic = gcn.get_entity("I tried setting cross domain with access analysis")
print(dic)
# 21 characters => done!
# {0: 'access analysis', 7: 'Cross domain'}
By the way, in MeCab,
#Mecab, which is popular as an extended dictionary-ipadic-I use neologd
$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd
I tried setting cross domain with access analysis
Access analysis noun,Proper noun,General,*,*,*,access analysis,Access Kaiseki,Access Kaiseki
Particles,Case particles,General,*,*,*,so,De,De
Cross noun,Change connection,*,*,*,*,cross,cross,cross
Domain noun,General,*,*,*,*,domain,domain,domain
Particles,Case particles,General,*,*,*,To,Wo,Wo
Setting noun,Change connection,*,*,*,*,Setting,Setting,Setting
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
Only verb,Non-independent,*,*,One step,Continuous form,View,Mi,Mi
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
EOS
have become.
You can see that "access analysis" is said well for both the former and the latter. (By the way, if you do morphological analysis directly as mecab
without taking mecab-ipadic-neologd
as an argument, "access" and "analysis" will be separated.)
However, for example, if you want to extract the terminology "cross-domain", you cannot extract it directly with MeCab, so using the external tool GCP Natural Language API will achieve your goal. As a way of using it in the future, it is good to register the one extracted by GCP as a new word in the user dictionary and then use it like using MeCab again [^ add].
[^ add]: I wrote the details at http://qiita.com/knknkn1162/items/8c12f42dd167aae01c02#_reference-aa421a94c959d84ff7fb.
Recommended Posts