Until you can use the Google Speech API

Recently, I was investigating because I wanted to make a program using voice recognition, and when I heard rumors from various people that it was "accurate", I came up with the idea of using the Google Speech API. When I tried to move it, there were some places I was addicted to, so make a note of it.

What can i do?

Google Speech API is an API for using Google's speech recognition technology.

This API takes a voice file as input and outputs a plausible natural sentence corresponding to the voice with certainty. It seems that ver.2 is currently running.

API user registration

Registration on the Google Developer Console is required to use the Google Speech API.

Google Developer Console

Google Developer Console https://console.developers.google.com/project

In this area, the interface seems to differ greatly depending on whether or not you have used the Google API before. I hope you can take your procedure as a reference only.

スクリーンショット 2016-02-09 21.48.07.png

Click "Use Google API".

スクリーンショット 2016-02-09 21.49.28.png

Create a project by entering an appropriate project name and project ID. Click the created project name to move to the dashboard.

スクリーンショット 2016-02-09 21.51.00.png

I thought that the Speech API was included in the "Google API" tab and searched for it, but I couldn't find it (I was crazy about it here).

Join the Chrome-dev group

http://eyepodtouch.net/?p=81

According to this article, you need to join the Chrome-dev group to enable the Speech API from Japan.

https://groups.google.com/a/chromium.org/forum/?fromgroups#!forum/chromium-dev

スクリーンショット 2016-02-09 21.52.14.png

Click "Join the group to post" to join (note that the mailing list will notify you of the mailing list?).

API activation and key acquisition

スクリーンショット 2016-02-09 21.54.13.png

スクリーンショット 2016-02-09 21.54.26.png

If you can join the group, you can search the Google Speech API, so click "Enable API" to enable it (since the second image has already been enabled, the button is "API" Is disabled).

スクリーンショット 2016-02-09 21.56.01.png

From the "Credentials" tab

スクリーンショット 2016-02-09 21.58.30.png

Select "New Credentials"-> "API Key"-> "Server Key".

スクリーンショット 2016-02-09 21.57.45.png

Enter an appropriate server key name and click "Create" to get the key (character string) for the API, so make a note of this value.

スクリーンショット 2016-02-09 22.09.11.png

Now you are ready to use the API.

Try using the API

The environment is OS: Mac OS X 10.11 Microphone: Macbook Air Mid 2013 built-in microphone

Follow the usage example for the time being

Here, let's use the API once according to the usage example in a certain Github repository (it is a little scary because it says "Reverse Engineering"). https://github.com/gillesdemey/google-speech-v2 This page also contains what seems to be API specifications, so you can refer to it.

$ cd
$ mkdir src/
$ cd src/
$ git clone https://github.com/gillesdemey/google-speech-v2.git
$ cd google-speech-v2/

$ curl -X POST \
--data-binary @'audio/hello (16bit PCM).wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=<your api key>'
# {"result":[]}
# {"result":[{"alternative":[{"transcript":"hello Google","confidence":0.98762906},{"transcript":"hello Google Now"},{"transcript":"hello Google I"}],"final":true}],"result_index":0}

Try to load the sound source you recorded yourself

For WAV files, as shown on the above page,

#In an environment where homebrew can be used
$ brew install sox
$ rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav
rec WARN formats: can't set sample rate 16000; using 44100
rec WARN formats: can't set 1 channels; using 2
#↑ Although there is such a description, the actually created file has a sampling frequency of 16,000Hz, 1 channel ...

Input File     : 'default' (coreaudio)
Channels       : 2
Sample Rate    : 44100
Precision      : 32-bit
Sample Encoding: 32-bit Signed Integer PCM

In:0.00% 00:00:11.15 [00:00:00.00] Out:177k  [      |      ]        Clip:0
#Recording with the built-in microphone. Ctrl+End with C

SoX http://sox.sourceforge.net/ It seems that you can record using. (Of course, even if you don't use this, it's OK if you can prepare a 16-bit PCM sound source.)

Here, I tried to record the voice of "Hello" to try.

Type	value
file name	test.wav
format	WAVE
Bit number	16
Modulation method	PCM
Number of channels	2

For this test.wav

$ curl -X POST \
--data-binary @'test.wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=ja-JP&key=<your api key>'
{"result":[]}
{"result":[{"alternative":[{"transcript":"Hello","confidence":0.95324326},{"transcript":"Hello"},{"transcript":"Hello"},{"transcript":"Hello"}],"final":true}],"result_index":0}

By doing so, the character string corresponding to the properly recorded voice was obtained.

It is necessary to describe the MIME type and sampling frequency of the sound source in the request header. Also note that the parameter lang is subtly changed from'en-js'to'ja-JP' to support Japanese.

Try to write a script

From here it's a complete bonus. If it supports HTTP, you can send a request from a script as well.

Pepper Hands-on-Basic B Speech Recognition-Get Pepper to know your name using Google Speech API [Tech-Circle # 7] ↑ I referred to this.

Prepare the voice data you want to recognize in WAV format as before (test.wav). Suppose this file is in the same directory as the script below.

Below is the Python3 code. For example, test.py.

`test.py`


import sys
import json
import urllib.parse
import urllib.request

apikey = 'your api key'
endpoint = 'http://www.google.com/speech-api/v2/recognize'
query_string = {'output': 'json', 'lang': 'ja-JP', 'key': apikey}

url = '{0}?{1}'.format(endpoint, urllib.parse.urlencode(query_string))

headers = {'Content-Type': 'audio/l16; rate=16000'}
voice_data = open(sys.argv[1], 'rb').read()

request = urllib.request.Request(url, data=voice_data, headers=headers)
response = urllib.request.urlopen(request).read()

#Since the output is JSON with multiple lines, delete the ones that seem unnecessary
for line in response.decode('utf-8').split():
    if not line:
        continue
    else:
	    res = json.loads(line)
   	    if res['result'] == []:
            continue
        else:
            print(res)

$ python test.py test.wav
{'result_index': 0, 'result': [{'final': True, 'alternative': [{'confidence': 0.95324326, 'transcript': 'Hello'}, {'transcript': 'Hello'}, {'transcript': 'Hello'}, {'transcript': 'Hello'}]}]}

Other notes

The free version is limited to about 50 times a day (although it seems that you can actually move it a little more).
It takes an unexpected execution time (for an audio file of less than 10 seconds, the execution time is about 10 seconds).
As mentioned on the above Github page, this API seems to mainly support audio files of about 10-15 seconds, and there is a high possibility that accurate answers cannot be returned for audio that is too long / too short. As a knack for recording, I felt that even if the speaking time was not long, if the length of the entire audio file was set to about 10 seconds, it would be picked up well.
As far as I confirmed, I could recognize 16bit PCM, 44,100Hz and 16,000Hz audio files well (only for 1 channel). It seems that it also supports flac format, but I'm not sure.
For some reason, the JSON of the response may be returned as multiple lines. Since I don't know the specifications, I have no choice but to guess the meaning of other parameters.

Impressions I tried

I think the accuracy is still high. Even if you say a long sentence or something that is a little hard to hear, it will pick you up. The downside is that it takes a long time to execute, and the accuracy varies depending on the file, so it seems a little difficult to use it in an application that requires real-time performance and a certain degree of accuracy. ~~I think the guy running on the smartphone is faster and more accurate. </ S> Even after deducting that, I thought it was purely amazing that this could be used for free.~~

Speaking of greed, the interface of the API registration is completely different from the reference article I tried about a year ago, I can not use it unless I enter the mailing list in the first place, I can not find the official specifications, and I think that various things are terrible, so please improve it. is not it…

Other reference materials

On the use of Google’s Speech Recognition API Version 2 | Similar Sounds How to use Google Speech API ver.2 Using Google Speech API from Ruby-Qiita How to get API key for Google Speech API | Eyepod touch

Recommended Posts
Until you can use the Google Speech API

Until you use the Kaggle API with Colab

Until you use Google cola boratory

Until you can read the error log

Until you can use opencv with python

How to use the Google Cloud Translation API

Can you delete the file?

Try and learn iptables, until you can browse the web

[Python] Hit the Google Translation API

Use the Flickr API from Python

Use Google Analytics API from Python

You can also use virtualenv from the IntelliJ IDEA Python plugin

Google Cloud Speech API vs. Amazon Transcribe

Streaming speech recognition with Google Cloud Speech API

Use The Metabolic Disassembler on Google Colaboratory

Note until you use emacs with WSL

Try to scrape as much as you can scrape the sample program of Google Spreadsheet API (v4) quick start

You can use Dash on Jupyter jupyter_dash

Speech transcription procedure using Google Cloud Speech API

Use Google Cloud Vision API from Python

Get holidays with the Google Calendar API

Until you can use RTX 2060 on Windows 10 (Installing NVIDIA DRIVER, NVIDIA CUDA toolkit, PyTorch)

Until you install Caffe and run the sample

Use the MediaWiki API to get Wiki information

I tried using the Google Cloud Vision API

Until you run the changefinder sample in python

How to use the NHK program guide API

Use the Kaggle API inside a Docker container

Until you use PhantomJS with Python on Heroku

[Google Cloud Platform] Use Google Cloud API using API Client Library

I tried to scrape YouTube, but I can use the API, so don't do it.

Until you run a Flask application on Google App Engine for the time being

Use JIRA API

Until you can install blender and run it with python for the time being

Investigation of the relationship between speech preprocessing and transcription accuracy in the Google Cloud Speech API

Let's use the API of the official statistics counter (e-Stat)

Until you install Gauge and run the official sample

Let's use the Python version of the Confluence API module.

Create a tweet heatmap with the Google Maps API

Speech transcription procedure using Python and Google Cloud Speech API

You can read the analog meter with the example MNIST.

Speech file recognition by Google Speech API v2 using Python

[Python] Use the Face API of Microsoft Cognitive Services

Play music by hitting the unofficial API of Google Play Music

The programming language you want to be able to use

Until you can do simple image recognition with Jupyter

I can't use the darknet command in Google Colaboratory!