Use Google services to recognize voice files.

environment

· Python3.5 64bits by Anaconda ・ Win10 -The audio file is WAV. I just need to convert it with sox separately.

Packages to install

SpeechRecognition https://github.com/Uberi/speech_recognition It is a package that makes it easy to use various voice recognition cloud services. High functionality. pyaudio It seems necessary for Speech Recognition to work. google-api-python-client It is a package that is used when diverting the sample source of SpeechRecognition, so install it. pydub It is used to separate audio files in a silent section. pip install pydub is. FFMPEG I'm not sure why I have to install it, http://chachay.hatenablog.com/entry/2016/10/03/215841 I am doing as written in.

To use Google's Speech API v2

approximately, http://qiita.com/lethe2211/items/7c9b1b82c7eda40dafa9 I think that's right. It is troublesome that it does not come out unless you join ML.

important point

If the audio file is too long, I don't know what the cause is (as of January 11, 2017), but I get an error. In my environment, the result of an audio file of about 10 seconds is returned, but when it reaches 20 seconds, an error occurs.

When using the Speech Recognition sample, Try adjusting the time with duration like ʻaudio = r.record (source, duration = 10`) and check the result. If it is long, you will get an error, right?

File division in silent part

fundamentally, http://chachay.hatenablog.com/entry/2016/10/03/215841 It is as follows.

When trying to recognize voice with Google Speech API v2, if the file is large, an error will occur (I don't know what the cause is), so this is an attempt to divide the file into silent parts for recognition.

Source when performing voice recognition

I use various libraries and perform data conversion between them via wav file, so I think there is a lot of waste, but I will post the source.

Import, I think there is something useless, so please omit it as appropriate.

import speech_recognition as sr
from os import path
from googleapiclient import discovery
import httplib2
import base64, json
import urllib
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence


if __name__ == '__main__':
    r = sr.Recognizer()
    audio_data = []
    sound = AudioSegment.from_file('./filename.wav', format='wav')
    chunks = split_on_silence(sound, min_silence_len=1500, silence_thresh=-30, keep_silence=500)
    
    for chunk in chunks:
        chunk.export('./temp.wav', format='wav')
        AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "temp.wav") 

        with sr.AudioFile(AUDIO_FILE) as source:
            audio = r.record(source)
            audio_data.append(audio)
    for audio in audio_data:
        try:
            print("Google Speech Recognition thinks you said " + r.recognize_google(audio,key='your API key', language='ja'))
        except sr.UnknownValueError:
            print("Google Speech Recognition could not understand audio")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service; {0}".format(e))

Reference site

http://chachay.hatenablog.com/entry/2016/10/03/215841 https://pypi.python.org/pypi/SpeechRecognition/

Speech file recognition by Google Speech API v2 using Python