What is Pocket Sphinx?

Pocket Sphinx is a speech recognition engine that performs continuous speech recognition. It only supports English, so it doesn't seem to be suitable for Japanese. (** There seems to be a way to forcefully respond, but I will omit it because the story derails ... **)

Execution environment

This time, we checked the operation in the following environment.

Windows10
- Anaconda(Python 3.7.9)

Package installation

It doesn't end with pip install pocketsphinx. .. .. You will need swig and Build Tool (C ++) to install PocketSphinx.

I was able to install swig by referring to here. You can download the Build Tool from here.

When you're ready, use pip to install PocketSphinx.

pip install pocketsphinx

Sample code execution

The sample code of PocketSphinx is as follows. If you execute this way, English words will be detected.

`sample.py`


import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()

speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=os.path.join(model_path, 'en-us'),
    lm=os.path.join(model_path, 'en-us.lm.bin'),
    dic=os.path.join(model_path, 'cmudict-en-us.dict')
)

for phrase in speech:
    print(phrase)

It seems that the default value of LiveSpeech is as follows.

verbose = False
logfn = /dev/null or nul
audio_file = site-packages/pocketsphinx/data/goforward.raw
audio_device = None
sampling_rate = 16000
buffer_size = 2048
no_search = False
full_utt = False
hmm = site-packages/pocketsphinx/model/en-us
lm = site-packages/pocketsphinx/model/en-us.lm.bin
dict = site-packages/pocketsphinx/model/cmudict-en-us.dict

Understanding the referenced directory

In PocketSphinx, an event like this occurred. We will sort out the causes of this event.

RuntimeError: new_Decoder returned -1

The model_path used in hmm / lm / dict is set as follows.

model_path = get_model_path()

When I print this path, it looks like this.

C:\Users\<UserName>\.conda\envs\pocketsphinx\lib\site-packages\pocketsphinx\model

(pocketsphinx) C:\Users\<UserName>\.conda\envs\pocketsphinx\Lib\site-packages\pocketsphinx\model>tree /f
C:.
│  cmudict-en-us.dict
│  en-us.lm.bin
│
└─en-us
        feat.params
        mdef
        means
        noisedict
        README
        sendump
        transition_matrices
        variances

Apparently, the default dictionary data is included in the package installed by pip. It seems that RuntimeError will occur if this path is incorrect.

Digression The story of this error has information when I search, but it is written to download en-us, In the first place, there was some code that the error could not be avoided. (In the end, as Python in the first place, some code failed to replace the path ...) At first, I also neglected to confirm the pass, so it was an event that became more and more mysterious.

Application of self-made dictionary data

This time, the directory structure is like this. model \ sample.dict is the self-made dictionary data.

(pocketsphinx) C:\Users\<UserName>\Documents\pocketsphinx_sample>tree /f
C:.
│  exmaple.py
└─model
       sample.dict

Next, looking at the arguments of LiveSpeech, it seems that you should set the following arguments. For lm, the document also describes how to set False, but for some reason it causes a RuntimeError, so I am targeting it.

hmm=os.path.join(model_path, 'en-us'),
lm=os.path.join(model_path, 'en-us.lm.bin'),
dic=os.path.join(model_path, 'cmudict-en-us.dict')

`exmaple.py`


import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()
my_model_path = 'C:\\Users\\<UserName>\\Documents\\pocketsphinx_sample\\model'

speech = LiveSpeech(
    hmm=os.path.join(model_path, 'en-us'),
    lm=os.path.join(model_path, 'en-us.lm.bin'),
    dic=os.path.join(my_model_path, 'sample.dict'),
)

for phrase in speech:
    print(phrase)

Remaining questions

All I had to do was read the dictionary data, so I was able to achieve my goal. However, the cause of the error with lm = False remains unknown. I also saw some sample code that wrote an argument called jsgf to read a gram (grammar file). Looking at the code on PocketSphinx's Github, I didn't see any such arguments. If you need to read the grammar file in the future, you need to confirm it.

reference

http://rinatz.github.io/swigdoc/install.html https://pypi.org/project/pocketsphinx/ https://visualstudio.microsoft.com/ja/thank-you-downloading-visual-studio/?sku=BuildTools https://stackoverflow.com/questions/44339312/new-decoder-returned-1-when-trying-to-run-pocketsphinx-on-raspberry-pi/51346264#51346264

A story that struggled to handle the Python package of PocketSphinx