Automatic voice transcription with Google Cloud Speech API

A dream of automatically transcribing conference and interview sound sources

The API was updated in August 2017 to allow voice for up to 3 hours. I tried to convert voice data to txt file. The environment uses GCP's cloud console, which can be used on the go, so that it can be automatically transcribed as soon as an interview is taken.

※reference http://jp.techcrunch.com/2017/08/15/20170814google-updates-its-cloud-speech-api-with-support-for-more-languages-word-level-timestamps/

Environment, language, etc.

Enable Speech API

Enable the Speech API by referring to the URL below. Free for up to 60 minutes of audio, after which you will be charged 0.6 cents every 15 seconds, but if you are using Google Cloud Platform for the first time, you will be granted $ 300, which is valid for one year (as of August 2017) https://cloud.google.com/speech/docs/getting-started

Create the authentication information in the service account key file (JSON format).

API authentication with Google Cloud Shell

Launch Google Cloud Shell and upload the JSON file for authentication from the upper right corner.

image2017-8-22_11-46-50.png

After uploading, authenticate with the JSON file.

python


$ export GOOGLE_APPLICATION_CREDENTIALS=hogehoge.json

Create audio file

You cannot use mp3, AAC, etc. as they are, and you need to convert them to a compatible format. I tried various things, but the following settings are recommended.

(Reference: Online conversion service) https://audio.online-convert.com/convert-to-flac

conversion

Upload the FLAC file to Google Cloud Strage. Click here for how to make Google Cloud Storage https://cloud.google.com/storage/docs/quickstart-console?hl=ja

I uploaded the python file directly to the shell. I'm not a main engineer, so while watching the tutorial, I'm gonna go ...

transcribe.py


# !/usr/bin/env python
# coding: utf-8
import argparse
import io
import sys
import codecs
import datetime
import locale

def transcribe_gcs(gcs_uri):
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        language_code='ja-JP')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    operationResult = operation.result()

    d = datetime.datetime.today()
    today = d.strftime("%Y%m%d-%H%M%S")
    fout = codecs.open('output{}.txt'.format(today), 'a', 'shift_jis')

    for result in operationResult.results:
      for alternative in result.alternatives:
          fout.write(u'{}\n'.format(alternative.transcript))
    fout.close()

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'path', help='GCS path for audio file to be recognized')
    args = parser.parse_args()
    transcribe_gcs(args.path)

Finally, do the following and wait for a while to finish the conversion.

python


$ python transcribe.py gs://Bucket name/testmusic.flac

Caution

python


config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000, #Add this line
        language_code='ja-JP')

python


sudo pip install --upgrade google-cloud-speech

Accuracy (impression)

Things that are not related to accuracy

Things related to accuracy

It is surprising that the echo of the room affects the accuracy considerably. Noise such as the sound of air conditioning did not affect the accuracy even if it was quite noisy. It may be easy to separate.

Recommended Posts

Automatic voice transcription with Google Cloud Speech API
Streaming speech recognition with Google Cloud Speech API
Speech transcription procedure using Google Cloud Speech API
Speech transcription procedure using Python and Google Cloud Speech API
Speech recognition of wav files with Google Cloud Speech API Beta
Google Cloud Speech API vs. Amazon Transcribe
Transcribe WAV files with Cloud Speech API
[GCP] [Python] Deploy API serverless with Google Cloud Functions!
Investigation of the relationship between speech preprocessing and transcription accuracy in the Google Cloud Speech API
Introducing Google Map API with rails
Stream speech recognition using Google Cloud Speech gRPC API on python3 on Mac!
Google Cloud Vision API sample for python
Voice authentication & transcription with Raspberry Pi 3 x Julius x Watson (Speech to Text)
Try using Python with Google Cloud Functions
Use Google Cloud Vision API from Python
[GCP] Operate Google Cloud Storage with Python
Transcription of images with GCP's Vision API
Get holidays with the Google Calendar API
Automatic follow-back using streaming api with Tweepy
Problems with output results with Google's Cloud Vision API
Text extraction with GCP Cloud Vision API (Python3.6)
I tried "License OCR" with Google Vision API
I tried using the Google Cloud Vision API
How to use the Google Cloud Translation API
Until you can use the Google Speech API
I tried "Receipt OCR" with Google Vision API
[Google Cloud Platform] Use Google Cloud API using API Client Library
Get data labels by linking with Google Cloud Vision API when previewing images with Rails
Book registration easily with Google Books API and Rails
Create a tweet heatmap with the Google Maps API
A story linked with Google Cloud Storage with a little ingenuity
Use of Google Cloud Storage (GCS) with "GAE / Py"
How to analyze with Google Colaboratory using Kaggle API
Run Google Cloud Functions locally with Cloud Native Build packs
Speech file recognition by Google Speech API v2 using Python
Upload to a shared drive with Google Drive API V3
I moved the automatic summarization API "summpy" with python3.
A story of reading a picture book by synthesizing voice with COTOHA API and Cloud Vision API