[Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪

Install Lib, a natural language processing system, to run the conversation app. It's almost the same as ~~-nano, but ~~ ** I had a hard time **, so I'd like to describe it carefully. It is almost as a reference, but some directories are different, so we will support it. 【reference】 -Install mecab on ubuntu 18.10

install mecab

install mecab

$ sudo apt install mecab
$ sudo apt install libmecab-dev
$ sudo apt install mecab-ipadic-utf8

I've done so far.

$ mecab
Limited express Hakutaka
Limited express noun,General,*,*,*,*,Limited express,Tokyu,Tokkyu
Is a particle,Particle,*,*,*,*,Is,C,Wow
Phrasal verb,Independence,*,*,Kuru,Word connection special 2,come,Ku,Ku
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Ka particle,Sub-particles / parallel particles / final particles,*,*,*,*,Or,Mosquito,Mosquito
EOS

You will get the above output.

Install neologd

$ git clone https://github.com/neologd/mecab-ipadic-neologd.git
$ cd mecab-ipadic-neologd
$ sudo bin/install-mecab-ipadic-neologd

You can install it up to this point without any problems. It took a long time (about 30 minutes) to download the dictionary.

Edit / etc / mecabrc

A problem occurred here. On ubuntu, dictionaries are installed in the following directories, but on Raspbian it seems different.

dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd

So, search the directory where the file exists. 【reference】 Find files [find and locate]

$ sudo find / -name '*mecab-ipadic-neologd*'
/usr/lib/arm-linux-gnueabihf/mecab/dic/mecab-ipadic-neologd

You can now rewrite it with the following command. By the way, please refer to the vi command. 【reference】 Basic operation of vi

$ sudo vi /etc/mecabrc

So, I rewrote it as follows.

$ cat /etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /var/lib/mecab/dic/debian
dicdir =/usr/lib/arm-linux-gnueabihf/mecab/dic/mecab-ipadic-neologd 
; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

Then confirm that the dictionary has changed. I was able to separate them in a cohesive form as "Hakutaka".

$ mecab
Limited express Hakutaka
Limited express noun,General,*,*,*,*,Limited express,Tokyu,Tokkyu
Hakutaka noun,Proper noun,General,*,*,*,Hakutaka,Hakutaka,Hakutaka
EOS

Make it available in python3

sudo apt install swig
sudo apt install python3-pip
sudo pip3 install mecab-python3

Now the reference sample works as below.

$ python3 mecab_sample.py
noun,固有noun,General,*,*,*,Hakutaka,Hakutaka,Hakutaka
Hakutaka
noun,固有noun,area,General,*,*,Toyama,Toyama,Toyama
Toyama
noun,固有noun,area,General,*,*,Kanazawa,Kanazawa,Kanazawa
Kanazawa
noun,固有noun,area,General,*,*,Kenrokuen,Kenrokuen,Kenrokuen
Kenrokuen

Install pyaudio

The conversation app uses pyaudio because it outputs voice conversations. 【reference】 Install PyAudio | Python memorandum

$ sudo apt-get install python3-pyaudio

I was able to install it successfully.

Install Pykakasi

This is used to generate recorded voice (file name is alphabetic) and convert the generated voice to Text.

$ pip3 install pykakasi --user

Check with the code below

# coding: utf-8
from pykakasi import kakasi
kakasi = kakasi()
kakasi.setMode('H', 'a')
kakasi.setMode('K', 'a')
kakasi.setMode('J', 'a')
conv = kakasi.getConverter()
filename = 'Its a sunny day.jpg'
print(filename) #Its a sunny day.jpg
print(type(filename))
print(conv.do(filename))

Output example.


$ python3 pykakasi_ex.py
Its a sunny day.jpg
<class 'str'>
honjitsuhaseitennari.jpg

environment

$ uname -a
Linux raspberrypi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

Run conversation app

gensm_ex1.py

$ python3 gensm_ex1.py

Start training
Epoch: 1
gensm_ex1.py:16: DeprecationWarning: Call to deprecated `iter` (Attribute will be removed in 4.0.0, use self.epochs instead).
  model.train(sentences, epochs=model.iter, total_examples=model.corpus_count)
Epoch: 2
Epoch: 3
Epoch: 4
Epoch: 5
Epoch: 6
Epoch: 7
Epoch: 8
Epoch: 9
Epoch: 10
Epoch: 11
Epoch: 12
Epoch: 13
Epoch: 14
Epoch: 15
Epoch: 16
Epoch: 17
Epoch: 18
Epoch: 19
Epoch: 20
SENT_0
[('SENT_2', 0.08270145207643509), ('SENT_3', 0.0347767099738121), ('SENT_1', -0.08307887613773346)]
SENT_3
[('SENT_0', 0.0347767099738121), ('SENT_1', 0.02076556906104088), ('SENT_2', -0.003991239238530397)]
SENT_1
[('SENT_3', 0.02076556347310543), ('SENT_2', 0.010350690223276615), ('SENT_0', -0.08307889103889465)]
gensm_ex1.py:33: DeprecationWarning: Call to deprecated `similar_by_word` (Method will be removed in 4.0.0, use self.wv.similar_by_word() instead).
  print (model.similar_by_word(u"fish"))
[('now', 0.15166150033473969), ('Sea', 0.09887286275625229), ('tomorrow', 0.03284810855984688), ('Cat', 0.019402338191866875), ('Barked', -0.0008345211390405893), ('swim', -0.02624458074569702), ('now日', -0.05557712912559509), ('dog', -0.0900348424911499)]

RaspberryPi4_conversation/model_skl.py /

$ python3 model_skl.py
TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.float64'>, encoding='utf-8',
                input='content', lowercase=True, max_df=1.0, max_features=None,
                min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
                smooth_idf=True, stop_words=None, strip_accents=None,
                sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, use_idf=True, vocabulary=None)
{'I': 5, 'Soy sauce': 6, 'ramen': 2, 'Tonkotsu': 1, 'Like': 4, 'is': 0, 'miso': 3}
{'Soy sauce': 4, 'ramen': 1, 'Tonkotsu': 0, 'Like': 3, 'miso': 2}
Soy sauce 4
Ramen 1
Tonkotsu 0
Like 3
Miso 2
['Tonkotsu', 'ramen', 'miso', 'Like', 'Soy sauce']
  (0, 4)	0.4976748316029239
  (0, 1)	0.7081994831914716
  (0, 0)	0.3540997415957358
  (0, 3)	0.3540997415957358
  (1, 1)	0.7081994831914716
  (1, 0)	0.3540997415957358
  (1, 3)	0.3540997415957358
  (1, 2)	0.4976748316029239
{'Soy sauce': 6, 'ramen': 3, 'Tonkotsu': 2, 'Like': 5, 'miso': 4, 'Katsudon': 1, 'Okonomiyaki': 0}
  (0, 6)	0.5486117771118656
  (0, 3)	0.6480379064629606
  (0, 2)	0.4172333972107692
  (0, 5)	0.3240189532314803
  (1, 3)	0.6480379064629607
  (1, 2)	0.41723339721076924
  (1, 5)	0.32401895323148033
  (1, 4)	0.5486117771118657
  (2, 3)	0.35959372325985667
  (2, 5)	0.35959372325985667
  (2, 1)	0.6088450986844796
  (2, 0)	0.6088450986844796
[[1.         0.69902512 0.34954555]
 [0.69902512 1.         0.34954555]
 [0.34954555 0.34954555 1.        ]]

In this way, I arrived at the conversation app safely. RaspberryPi4_conversation/auto_conversation_.py

$ python3 auto_conversation_.py -i data/conversation_n.txt  -s data/stop_words.txt
data/conversation_n.txt
>It's nice weather today
(0.41):That's right.
>What is that
(0.55):What is a dog
>The dog is a dog
(0.41):Where was the dog
>It's an alley over there
(0.46):* * * What is it?
>Near the park
(0.00): """"Why do I have to do it alone? Please help me a little.","""
>What should i help
(0.46):(Yes, that's good)
>Somehow it's crazy
(0.38):Who are you talking about
>You see
(0.00):Say "Awanchan", and by the way, these people went over here, and we went over here.
>I don't remember that
(0.35):I'm not an idiot
>I think it's stupid
(0.33):He looks good.
> 

Below is the installation of what you need.

【reference】 How to install scipy and numpy on Ubuntu 16.04?

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install python3-numpy python3-scipy
$ sudo pip3 install numpy scipy
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.16.2)
Requirement already satisfied: scipy in /usr/lib/python3/dist-packages (1.1.0)
$ pip3 install --user gensim

Successfully installed boto-2.49.0 boto3-1.11.14 botocore-1.14.14 gensim-3.8.1 jmespath-0.9.4 s3transfer-0.3.3 smart-open-1.9.0

【reference】 Install scikit-learn in Ubuntu

$ sudo pip3 install scikit-learn
...
Requirement already satisfied: scipy>=0.17.0 in /usr/lib/python3/dist-packages (from scikit-learn) (1.1.0)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib/python3/dist-packages (from scikit-learn) (1.16.2)
Installing collected packages: joblib, scikit-learn
Successfully installed joblib-0.14.1 scikit-learn-0.22.1

Summary

・ Installed Lib required for natural language on RasPi4 ・ For the time being, I was able to run a natural language application

・ I want to make the conversation app a little more decent

bonus

This is about to enter. 【reference】 ・ Difference between pip list and freeze

$ pip3 freeze > requirements.txt

RaspberryPi4_conversation / requirements.txt

$ pip3 freeze
absl-py==0.9.0
arrow==0.15.5
asn1crypto==0.24.0
astor==0.8.1
astroid==2.1.0
asttokens==1.1.13
attrs==19.3.0
automationhat==0.2.0
backcall==0.1.0
beautifulsoup4==4.7.1
bleach==3.1.0
blinker==1.4
blinkt==0.1.2
boto==2.49.0
boto3==1.11.14
botocore==1.14.14
buttonshim==0.0.2
Cap1xxx==0.1.3
certifi==2018.8.24
chardet==3.0.4
Click==7.0
colorama==0.3.7
colorzero==1.1
cookies==2.2.1
cryptography==2.6.1
cycler==0.10.0
Cython==0.29.14
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
docutils==0.14
drumhat==0.1.0
entrypoints==0.3
envirophat==1.0.0
ExplorerHAT==0.4.2
Flask==1.0.2
fourletterphat==0.1.0
gast==0.3.3
gensim==3.8.1
google-pasta==0.1.8
gpiozero==1.5.1
grpcio==1.27.1
h5py==2.10.0
html5lib==1.0.1
idna==2.6
importlib-metadata==1.5.0
ipykernel==5.1.4
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isort==4.3.4
itsdangerous==0.24
jedi==0.13.2
jinja2-time==0.2.0
jmespath==0.9.4
joblib==0.14.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.1
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==17.1.1
keyrings.alt==3.1.1
kiwisolver==1.1.0
klepto==0.1.8
lazy-object-proxy==1.3.1
logilab-common==1.4.2
lxml==4.3.2
make==0.1.6.post1
Markdown==3.2
MarkupSafe==1.1.0
matplotlib==3.1.3
mccabe==0.6.1
mecab-python3==0.996.3
microdotphat==0.2.1
mistune==0.8.4
mote==0.0.4
motephat==0.0.2
mypy==0.670
mypy-extensions==0.4.1
nbconvert==5.6.1
nbformat==5.0.4
notebook==6.0.3
numpy==1.16.2
oauthlib==2.1.0
olefile==0.46
opencv-python==3.4.6.27
pandocfilters==1.4.2
pantilthat==0.0.7
parso==0.3.1
pexpect==4.8.0
pgzero==1.2
phatbeat==0.1.1
pianohat==0.1.0
picamera==1.13
pickleshare==0.7.5
piglow==1.2.5
pigpio==1.44
pox==0.2.7
prometheus-client==0.7.1
prompt-toolkit==3.0.3
protobuf==3.11.3
psutil==5.5.1
ptyprocess==0.6.0
PyAudio==0.2.11
pygame==1.9.4.post1
Pygments==2.3.1
PyGObject==3.30.4
pyinotify==0.9.6
PyJWT==1.7.0
pykakasi==1.2
pylint==2.2.2
pyOpenSSL==19.0.0
pyparsing==2.4.6
pyrsistent==0.15.7
pyserial==3.4
python-apt==1.8.4.1
python-dateutil==2.8.1
PyYAML==5.3
pyzmq==18.1.1
qtconsole==4.6.0
rainbowhat==0.1.0
requests==2.21.0
requests-oauthlib==1.0.0
responses==0.9.0
roman==2.0.0
RPi.GPIO==0.7.0
RTIMULib==7.2.1
s3transfer==0.3.3
scikit-learn==0.22.1
scipy==1.1.0
scrollphat==0.0.7
scrollphathd==1.2.1
SecretStorage==2.3.1
Send2Trash==1.5.0
sense-hat==2.2.0
simplejson==3.16.0
six==1.12.0
skywriter==0.0.7
smart-open==1.9.0
sn3218==1.2.7
soupsieve==1.8
spidev==3.4
ssh-import-id==5.7
tensorboard==1.13.1
tensorflow-estimator==1.14.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
thonny==3.2.6
tornado==6.0.3
touchphat==0.0.1
traitlets==4.3.3
twython==3.7.0
unicornhathd==0.0.4
wcwidth==0.1.8
webencodings==0.5.1
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==2.2.0

Recommended Posts

[Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪
[Introduction to RasPi4] Environment construction; OpenCV / Tensorflow, Japanese input ♪
From Ubuntu 20.04 introduction to environment construction
Introduction to regular expression processing system
Preparing to start natural language processing
Set up a development environment for natural language processing
Python: Natural language processing
Go language environment construction
[Chapter 5] Introduction to Python with 100 knocks of language processing
Introduction to Python language
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
RNN_LSTM2 Natural language processing
Building an environment for natural language processing with Python
[Chapter 3] Introduction to Python with 100 knocks of language processing
Python development environment construction 2020 [From Python installation to poetry introduction]
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Python] Try to classify ramen shops by natural language processing
opencv-python Introduction to image processing
Natural language processing 1 Morphological analysis
Natural language processing 3 Word continuity
Natural language processing 2 Word similarity
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 06 Memo "Identifier"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 02 Memo "Pre-processing"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 07 Memo "Evaluation"
Spigot (Paper) Introduction to how to make a plug-in for 2020 # 01 (Environment construction)
Loose articles for those who want to start natural language processing
Summarize how to preprocess text (natural language processing) with tf.data.Dataset api
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 04 Memo "Feature Extraction"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 15 Memo "Data Collection"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 08 Memo "Introduction to Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 11 Memo "Word Embeddings"
100 natural language processing knocks Chapter 4 Commentary
Introduction to Protobuf-c (C language ⇔ Python)
Artificial language Lojban and natural language processing (artificial language processing)
Natural language processing analyzer installation summary
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
From the introduction of GoogleCloudPlatform Natural Language API to how to use it
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
[Introduction to Data Scientists] Basics of scientific calculation, data processing, and how to use the graph drawing library ♬ Environment construction
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 13 Memo "Recurrent Neural Networks"