*** Information as of August 2016 ***
A trial note of voice recognition for wav files on Google Cloud Speech API Beta.
CLOUD SPEECH API
As you can see in Google Cloud Speech API Beta, the API for speech recognition.
--Supports 80 languages --Resistant to noise --Contextual recognition --Device independent --Supports both real-time and recorded files
It seems to be an easy-to-use high-performance ASR.
Official documentation python sample code
According to Quickstart
Generate a Service Account key file (json) containing the private key and use it to get an authentication token each time.
As per the Set Up Your Project section of Quick Start.
However, when creating a "new service account" with 6 Service Account creation, there is an item called Role that is not in Document. I'm confused.
After registering the Service Account, you can download the json file, so save it in any location. Do not expose it to the public as it contains a private key.
$ gcloud auth print-access-token
Remember the authentication token that came back
Create `` `sync-request.json``` as per Make a Speech API Request in QuickStart and
sync-request.json
{
"config": {
"encoding":"FLAC",
"sample_rate": 16000
},
"audio": {
"uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
}
}
In the directory where sync-request.json is
$ curl -s -k -H "Content-Type: application/json" \
-H "Authorization:Authentication token obtained on Bearer" \
https://speech.googleapis.com/v1beta1/speech:syncrecognize \
-d @sync-request.json
Hopefully json will return the recognition result.
The location and format settings of the input file are specified in the Request body with json (`sync-request.json``` in the above example). The example
`sync-request.json``` uses a sample flac file pre-located in Google Cloud Storage, but at hand Of course, it is also possible to send audio data of, and it also supports encoding other than flac.
SyncRecognize of Rest API reference As per syncrecognize), specify the sound source and recognition settings with `` `configof Request body, and specify the audio data with
audio```.
The audio specification is[RecognitionAudio](https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionAudio)As you can see, if you want to send the audio file at hand with uri or content, you can encode it into a character string with Base64 and send it as content.
Since the encoding method of the sample is FLAC and the sampling rate is 16000 (16khz), match it with the audio data to be sent.
## Use Speech API with python
As you can see in the [Tutorial](https://cloud.google.com/speech/docs/rest-tutorial), you can call the Speech API from python instead of the `` `glcoud``` command + curl (Node.js). There is also a sample)
This procedure doesn't require the Google Cloud SDK, but instead requires the [Google API Client Library](https://developers.google.com/api-client-library/python/start/installation). I thought I didn't need a library because I could use curl, but [API Discovery Service](https://developers.google.com/discovery/) & Google API Client Library is used to get authentication tokens. If you don't need these, you can use it without a library by following curl mentioned above.
### Get Service Account key file
Same as step 1-3 of CLI above.
### Application Default Credential settings
The procedure is as per [Sample Code](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api/speech_rest.py), but here is the Service Account key for getting the authentication token The file must be set to the environment variable ``` GOOGLE_APPLICATION_CREDENTIALS``` in advance:
`` `$ export GOOGLE_APPLICATION_CREDENTIALS = Service Account file path` ```
When the authentication token is obtained by referencing this as [Application Default Credential](https://cloud.google.com/speech/docs/common/auth#authenticating_with_application_default_credentials) by the GoogleCredentials.get_application_default (). create_scoped () method. That thing.
### API call
As per [Sample Code](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api/speech_rest.py):
#### **`$ python speech_rest.py audio file.wav`**
The recognition result is displayed with.
Caution
-* When recognizing Japanese voice, change `languageCode``` of body from ```en-US``` to
`ja-UP. * --If you want to send FLAC encoded data, set
encoding``` of the body to FLAC.
--Since the recognition result is only json.dumps () in the sample, it is necessary to take measures so that it is displayed correctly when Japanese is recognized.
Since this sample is a process for one input file, if you want to recognize multiple files, it seems better not to repeat API Discover and token acquisition.
Since the authentication token seems to be updated at a reasonable frequency, care for token reacquisition is also required. What is the 401 suddenly returning during the test (experience 15-30 minutes?)? When I thought about it, the token was updated.
I'm sorry it's not quantitative:
--It takes some time to recognize (about 2-4 seconds?) --The recognition accuracy is quite high. Even if there is a fairly loud noise (playing music near the microphone), I can hear it properly. This accuracy is amazing without setting anything ――I want to try what happens when noise is a human voice --I haven't tried context-related options, so I'd like to use them in the future. --QuickStart says ** Learn in 5 minutes **, but 5 minutes was completely impossible for me and made me sad.
Recommended Posts