Let's implement English voice dialogue in Python [offline]

About pocket shinx

pocketsphinx is a module that enables offline English speech recognition. Click here for how to install pocketsphinx and how to use it The environment construction will be confirmed on this page as well, so it's okay to skip it (?).

Speech recognition using a unique dictionary (speech to text)

environment

ubuntu 18.04 python3

Environment

I have put together a sample in git, so please clone it and use it. "https://github.com/hir-osechi/pocketsphinx_sample"

`python`


git clone https://github.com/hir-osechi/pocketsphinx_sample.git

This contains the code that uses pocketshinx and svoxpico, so If these are not installed, do the following:

`python`


cd pocketsphinx_sample/
sh setup.sh

If you are interested in how to use svoxpico

Pocketshinx without any settings can be implemented with the following code.

`pocket_test.py`


from pocketsphinx import LiveSpeech
for phrase in LiveSpeech():
    print(phrase)

From here, you can add options inside the brackets of LiveSpeech (). When using your own dictionary

lm = False dic = path (.dict file) of the created original dictionary jsgf = path (.gram file) of the created original dictionary

To add.

Lm is a signal when using your own dictionary, and specify False.

Creating your own dictionary

About dict files

pocketsphinx has a word dictionary called ".dict" that contains tens of thousands of words and their utterance symbols.

Example weather　W EH DH ER were　W ER what　W AH T what(2)　HH W AH T where　W EH R where(2)　HH W EH R

All the words are stored in the dict file with the following path. /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict

By default, it looks for recognized words in these tens of thousands of dictionaries, so If you narrow down the number of words, the recognition accuracy will improve.

About gram files

In a gram file, you can specify a grammer, or grammar. For example, if you create the following gram file,

What food do you like ? Where do you live in ?

Only the two sentences of will be recognized.

#JSGF V1.0;
grammar test;
public <rule> = <command>;
<command> = what food do you like | where do you live in;

By the way, it is a hassle to create dict and gram files by hand every time, so I created a code that automatically creates a dict and gram file by inputting sentences. It is on git.

cd pocketsphinx_sample/tools
python3 gram_maker_by_input.py

Please enter as follows.

Enter the name of the dictionary you want to create:test
Please enter the text+ Enter
(Ctrl to exit-C)
===============================================================
do you like apple
i want to play tennis
please tell me the way to the kyoto station
let me know what i can do for you

Now you can perform voice recognition that responds to only 4 sentences. However, if this is left as it is, even a small amount of noise may be assigned to one of these four, so add noise.

cd pocketsphinx_sample/tools
python3 gram_noise_changer.py

Please enter as follows.

Enter the name of the dictionary for which you want to change the noise:test
Enter the txt file name of the noise 1 field you want to change(.Does not include txt)：noise_sample
===============================================================
Change the noise in this dictionary.
===============================================================
End of change
===============================================================

If you're curious about what you're doing, take a look at test.gram. (Noise contains words that were easy to recognize when you were not doing anything, please play with them as appropriate)

This completes the preparation!

Run

If you can confirm that only the sentence specified earlier is recognized by the following command, it is successful.

cd pocketsphinx_sample/
python3 dic_test.py

Voice dialogue

As an example of utilization, we created a program that allows questions and answers. The question text and answer are separated by "," and are included in pocketsphinx_sample / dictionary / QandA / QandA.txt.

To create your own dictionary from QandA.txt, run gram_maker_from_txt.py.

cd pocketsphinx_sample/tools
python3 gram_maker_from_txt.py

Please enter as follows.

Enter the name of the dictionary you want to create:QA_sample
Enter the txt file name you want to dictionary(.Does not include txt)：QandA
Please enter the txt file name of the noise 1 column you want to add(.Does not include txt)：noise_sample
End of dictionary

If you do, you should be able to implement a question and answer session.

cd pocketsphinx_sample/
python3 QA_test.py

Execution result ↓


[*] START RECOGNITION
----------------------------------
 are you happy ?
[*] SPEAK : yes
----------------------------------

[*] START RECOGNITION
----------------------------------
 what food do you like ?
[*] SPEAK : I like apples.
----------------------------------

To further improve accuracy

When improving the recognition accuracy, the noise should be specified strictly. For example, if "what food do you like" is easily misrecognized,

what what food what food do what food do you

By adding up to noise_sample.txt, you can prevent the output unless there is an exact match.

Official sample of pocketsphinx "https://pypi.org/project/pocketsphinx/"