Until you can use the Google Speech API

Recently, I was investigating because I wanted to make a program using voice recognition, and when I heard rumors from various people that it was "accurate", I came up with the idea of using the Google Speech API. When I tried to move it, there were some places I was addicted to, so make a note of it.

What can i do?

Google Speech API is an API for using Google's speech recognition technology.

This API takes a voice file as input and outputs a plausible natural sentence corresponding to the voice with certainty. It seems that ver.2 is currently running.

API user registration

Registration on the Google Developer Console is required to use the Google Speech API.

Google Developer Console

Google Developer Console https://console.developers.google.com/project

In this area, the interface seems to differ greatly depending on whether or not you have used the Google API before. I hope you can take your procedure as a reference only.

スクリーンショット 2016-02-09 21.48.07.png

Click "Use Google API".

スクリーンショット 2016-02-09 21.49.28.png

Create a project by entering an appropriate project name and project ID. Click the created project name to move to the dashboard.

スクリーンショット 2016-02-09 21.51.00.png

I thought that the Speech API was included in the "Google API" tab and searched for it, but I couldn't find it (I was crazy about it here).

Join the Chrome-dev group

http://eyepodtouch.net/?p=81

According to this article, you need to join the Chrome-dev group to enable the Speech API from Japan.

https://groups.google.com/a/chromium.org/forum/?fromgroups#!forum/chromium-dev

スクリーンショット 2016-02-09 21.52.14.png

Click "Join the group to post" to join (note that the mailing list will notify you of the mailing list?).

API activation and key acquisition

スクリーンショット 2016-02-09 21.54.13.png

スクリーンショット 2016-02-09 21.54.26.png

If you can join the group, you can search the Google Speech API, so click "Enable API" to enable it (since the second image has already been enabled, the button is "API" Is disabled).

スクリーンショット 2016-02-09 21.56.01.png

From the "Credentials" tab

スクリーンショット 2016-02-09 21.58.30.png

Select "New Credentials"-> "API Key"-> "Server Key".

スクリーンショット 2016-02-09 21.57.45.png

Enter an appropriate server key name and click "Create" to get the key (character string) for the API, so make a note of this value.

スクリーンショット 2016-02-09 22.09.11.png

Now you are ready to use the API.

Try using the API

The environment is OS: Mac OS X 10.11 Microphone: Macbook Air Mid 2013 built-in microphone

Follow the usage example for the time being

Here, let's use the API once according to the usage example in a certain Github repository (it is a little scary because it says "Reverse Engineering"). https://github.com/gillesdemey/google-speech-v2 This page also contains what seems to be API specifications, so you can refer to it.

$ cd
$ mkdir src/
$ cd src/
$ git clone https://github.com/gillesdemey/google-speech-v2.git
$ cd google-speech-v2/

$ curl -X POST \
--data-binary @'audio/hello (16bit PCM).wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=<your api key>'
# {"result":[]}
# {"result":[{"alternative":[{"transcript":"hello Google","confidence":0.98762906},{"transcript":"hello Google Now"},{"transcript":"hello Google I"}],"final":true}],"result_index":0}

Try to load the sound source you recorded yourself

For WAV files, as shown on the above page,

#In an environment where homebrew can be used
$ brew install sox
$ rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav
rec WARN formats: can't set sample rate 16000; using 44100
rec WARN formats: can't set 1 channels; using 2
#↑ Although there is such a description, the actually created file has a sampling frequency of 16,000Hz, 1 channel ...

Input File     : 'default' (coreaudio)
Channels       : 2
Sample Rate    : 44100
Precision      : 32-bit
Sample Encoding: 32-bit Signed Integer PCM

In:0.00% 00:00:11.15 [00:00:00.00] Out:177k  [      |      ]        Clip:0
#Recording with the built-in microphone. Ctrl+End with C

By

SoX http://sox.sourceforge.net/ It seems that you can record using. (Of course, even if you don't use this, it's OK if you can prepare a 16-bit PCM sound source.)

Here, I tried to record the voice of "Hello" to try.

Type value
file name test.wav
format WAVE
Bit number 16
Modulation method PCM
Number of channels 2

For this test.wav

$ curl -X POST \
--data-binary @'test.wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=ja-JP&key=<your api key>'
{"result":[]}
{"result":[{"alternative":[{"transcript":"Hello","confidence":0.95324326},{"transcript":"Hello"},{"transcript":"Hello"},{"transcript":"Hello"}],"final":true}],"result_index":0}

By doing so, the character string corresponding to the properly recorded voice was obtained.

It is necessary to describe the MIME type and sampling frequency of the sound source in the request header. Also note that the parameter lang is subtly changed from'en-js'to'ja-JP' to support Japanese.

Try to write a script

From here it's a complete bonus. If it supports HTTP, you can send a request from a script as well.

Pepper Hands-on-Basic B Speech Recognition-Get Pepper to know your name using Google Speech API [Tech-Circle # 7] ↑ I referred to this.

Prepare the voice data you want to recognize in WAV format as before (test.wav). Suppose this file is in the same directory as the script below.

Below is the Python3 code. For example, test.py.

test.py


import sys
import json
import urllib.parse
import urllib.request

apikey = 'your api key'
endpoint = 'http://www.google.com/speech-api/v2/recognize'
query_string = {'output': 'json', 'lang': 'ja-JP', 'key': apikey}

url = '{0}?{1}'.format(endpoint, urllib.parse.urlencode(query_string))

headers = {'Content-Type': 'audio/l16; rate=16000'}
voice_data = open(sys.argv[1], 'rb').read()

request = urllib.request.Request(url, data=voice_data, headers=headers)
response = urllib.request.urlopen(request).read()

#Since the output is JSON with multiple lines, delete the ones that seem unnecessary
for line in response.decode('utf-8').split():
    if not line:
        continue
    else:
	    res = json.loads(line)
   	    if res['result'] == []:
            continue
        else:
            print(res)
$ python test.py test.wav
{'result_index': 0, 'result': [{'final': True, 'alternative': [{'confidence': 0.95324326, 'transcript': 'Hello'}, {'transcript': 'Hello'}, {'transcript': 'Hello'}, {'transcript': 'Hello'}]}]}

Other notes

Impressions I tried

I think the accuracy is still high. Even if you say a long sentence or something that is a little hard to hear, it will pick you up. The downside is that it takes a long time to execute, and the accuracy varies depending on the file, so it seems a little difficult to use it in an application that requires real-time performance and a certain degree of accuracy. I think the guy running on the smartphone is faster and more accurate. </ S> Even after deducting that, I thought it was purely amazing that this could be used for free.

Speaking of greed, the interface of the API registration is completely different from the reference article I tried about a year ago, I can not use it unless I enter the mailing list in the first place, I can not find the official specifications, and I think that various things are terrible, so please improve it. is not it…

Other reference materials

On the use of Google’s Speech Recognition API Version 2 | Similar Sounds How to use Google Speech API ver.2 Using Google Speech API from Ruby-Qiita How to get API key for Google Speech API | Eyepod touch

Recommended Posts

Until you can use the Google Speech API
Until you use the Kaggle API with Colab
Until you use Google cola boratory
Until you can read the error log
Until you can use opencv with python
How to use the Google Cloud Translation API
Can you delete the file?
Try and learn iptables, until you can browse the web
[Python] Hit the Google Translation API
Use the Flickr API from Python
Use Google Analytics API from Python
You can also use virtualenv from the IntelliJ IDEA Python plugin
Google Cloud Speech API vs. Amazon Transcribe
Streaming speech recognition with Google Cloud Speech API
Use The Metabolic Disassembler on Google Colaboratory
Note until you use emacs with WSL
Try to scrape as much as you can scrape the sample program of Google Spreadsheet API (v4) quick start
You can use Dash on Jupyter jupyter_dash
Speech transcription procedure using Google Cloud Speech API
Use Google Cloud Vision API from Python
Get holidays with the Google Calendar API
Until you can use RTX 2060 on Windows 10 (Installing NVIDIA DRIVER, NVIDIA CUDA toolkit, PyTorch)
Until you install Caffe and run the sample
Use the MediaWiki API to get Wiki information
I tried using the Google Cloud Vision API
Until you run the changefinder sample in python
How to use the NHK program guide API
Use the Kaggle API inside a Docker container
Until you use PhantomJS with Python on Heroku
[Google Cloud Platform] Use Google Cloud API using API Client Library
I tried to scrape YouTube, but I can use the API, so don't do it.
Until you run a Flask application on Google App Engine for the time being
Use JIRA API
Until you can install blender and run it with python for the time being
Investigation of the relationship between speech preprocessing and transcription accuracy in the Google Cloud Speech API
Let's use the API of the official statistics counter (e-Stat)
Until you install Gauge and run the official sample
Let's use the Python version of the Confluence API module.
Create a tweet heatmap with the Google Maps API
Speech transcription procedure using Python and Google Cloud Speech API
You can read the analog meter with the example MNIST.
Speech file recognition by Google Speech API v2 using Python
[Python] Use the Face API of Microsoft Cognitive Services
Play music by hitting the unofficial API of Google Play Music
The programming language you want to be able to use
Until you can do simple image recognition with Jupyter
I can't use the darknet command in Google Colaboratory!