Article on transcribing voice using Google Cloud Speech API as one of the means to store the recorded sound source as text data by participating in English lectures and conferences. items / 659bde4cdc8ce5c78e29) was helpful, so I will reorganize the procedure below (procedure memo).
Since Google Cloud Platform is used in this procedure, it is assumed that the project has been created after completing the service common edition (P9-P20) of Google Cloud Platform Easy Startup Guide. I will.
It is assumed that a sound source that has been converted to the following format has already been created (Reference: Voice conversion site, Sound source actually used (PyConJP2017 English) Keynote speech)).
Monaural
Go to the Google Cloud Platform URL (https://cloud.google.com/?hl=ja) and press Open Console to enter the console screen.
Google Cloud Platform console login screen:
Console screen:
Select Tools & Services> APIs & Services> Library at the top left of the console screen, select Speech API from the list of APIs, and press Enable to enable the Google Speech API.
API list screen
Enable API ([Disable] is displayed because it is already enabled)
You can check the activation of Google Speech API in [API and Services]> [Dashboard]:
Select [API and Services]> [Credentials]> [Create Credentials]> [Service Account Key] on the left, set an appropriate [Service Account Name](assumed to be arkbbb here), and click the Create button. Press to download the JSON file.
Service account key creation screen:
Start Google Cloud Shell with the Google Cloud Shell button at the top right of the Google Cloud Platform console screen, upload the JSON obtained in 3., and set it in the environment variable.
Google Cloud Shell Button:
JSON upload:
Environment variable setting command
$ export GOOGLE_APPLICATION_CREDENTIALS=[3.JSON name obtained in].json
Upload the prepared voice data to Google Cloud Storage. First, select [Tools and Services]> [Storage]> [Browser] at the top left of the screen, create a bucket with [Create Bucket], double-click the created bucket, and click [Upload File] for audio data. To upload.
Go to Google Cloud Storage screen:
Creating a bucket (bucket name and other settings are in text):
Uploading files into your bucket:
Create a Python script for transcription execution on Google Cloud Shell.
Python file editing command (editor as you like)
$ nano transcribe.py
Python script for transcription (for English voice):
transcribe.py
# !/usr/bin/env python
# coding: utf-8
import argparse
import io
import sys
import codecs
import datetime
import locale
def transcribe_gcs(gcs_uri):
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
sample_rate_hertz=16000,
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
operationResult = operation.result()
d = datetime.datetime.today()
today = d.strftime("%Y%m%d-%H%M%S")
fout = codecs.open('output{}.txt'.format(today), 'a', 'shift_jis')
for result in operationResult.results:
for alternative in result.alternatives:
fout.write(u'{}\n'.format(alternative.transcript))
fout.close()
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument(
'path', help='GCS path for audio file to be recognized')
args = parser.parse_args()
transcribe_gcs(args.path)
If you want to transcribe Japanese, modify the following line:
language_code='en-US')
↓
language_code='ja-JP')
Execute transcription with the following command on Google Cloud Console.
$ python transcribe.py gs://Bucket name/Voice data name.flac
If you check the file created by the ls command on the Google Cloud Console after execution, a text file named [output * .txt] will be created, so you can open it and check the result. The result for the first 1-2 minutes was below. If you listen to it together with Sound source, there are some mistakes, but you can see that it is mostly transcribed.
and not.
We have just attended this big Tatum Outlet
and we held a pydata event it was actually the first I did it
and some of these slides are actually problem, says talk to and so at strata we saw many people talking about the Duke talking about Big Data there were looking at using Java in a management
and there was a whole lot of our versus Python language rewards on Facebook
the Travis and I were not content with the state of things we saw that python to play a very significant role Travis made the slide that's from The Little Prince that shows a snake swallowing the open
he was also talking about using compilers make python faster
it was also not that pilot event that we were very fortunate to have weido been awesome stopping by and we talked to him about things like the matrix multiplication operator we talked about coding expressions and things like that
and so this actually his picture show does Travis and West McKinney who's the greater pandas and Guido van Rossum
add
and we ask we don't fix the packaging problem he told us that we should do it ourselves
and so we did and that's how it came up with Honda and Anaconda which I think quite elegantly solves the difficult packaging problems for the Scientific Games
so we accepted the challenge and so for those who don't know what Anaconda is very quickly I'll give you it is basically a very simple way and very reliable way to get final versions of many very popular typical to build packages in libraries in the python ecosystem
By the way, the actual result data is here
Recommended Posts