** How to transcribe WAV file voice with Google Cloud Speech-to-Text API **. [Article on how to transcribe FLAC file](https://qiita.com/knyrc/items/ 7aab521edfc9bfb06625) was used as a reference to transcribe the WAV file. With this method, you can transcribe ** without converting to FLAC format.
** Since the Cloud Speech-to-Text API obtains the information required for transcription from the header information of the WAV file **, it is necessary to confirm in advance whether the header of the WAV file to be voice-converted is normal. The information to be confirmed in the header information is ** whether it is PCM (fmt_wave_format_type) and sampling frequency (fmt_samples_per_sec) **.
If you want to check the specifications of Cloud Speech-to-Text API, go to VS Code [Recognition Config](https://cloud.google.com/speech-to-text/docs/reference/rpc/google.cloud.speech.v1 Please jump to the definition source (# google.cloud.speech.v1.RecognitionConfig).
Check the header information by running the program written in Article on reading header information of WAVE file with Python.
--fmt_samples_per_sec: 8000-48000 (16000 is the best) --fmt_wave_format_type: 1 (points to PCM)
Refer to here and ** Export WAV file using Mac's default "Music" app ** It worked!
** [Caution] WAV files exported with iMovie and WAV files edited with QuickTime Player could not be moved because the headers are not normal! ** **
Basically, please refer to Article on how to transcribe FLAC file and create a ** json key **.
** [Caution] This time, the WAV file uploaded to Google Cloud Storage will be transcribed, so it is necessary to grant Cloud Storage access to the service account. **
Add a Storage Object Viewer to your role.
If you use a service account that you don't have Cloud Storage access to, you should get angry:
PermissionDenied: 403 hogehoge does not have storage.objects.get access to the Google Cloud Storage object.
Set the path of the json file you downloaded earlier to an environment variable.
export GOOGLE_APPLICATION_CREDENTIALS=./hoge.json
Please refer to Article on how to transcribe FLAC file and upload the WAV file to Cloud Storage. If you look at the object details screen, gs You can see the file path to the resource in Cloud Storage starting with.
I created it by referring to Article on how to transcribe FLAC files.
transcribe.py
# # !/usr/bin/env python
# coding: utf-8
import argparse
import datetime
def transcribe(gcs_uri):
from google.cloud import speech_v1 as speech
from google.cloud.speech_v1 import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
#Since it is written in the header of the audio file, it is not necessary to specify the sampling frequency.
config = types.RecognitionConfig(language_code='ja-JP')
operation = client.long_running_recognize(config, audio)
operationResult = operation.result()
now = datetime.datetime.now()
print('Waiting for operation to complete...')
with open('./{}.txt'.format(now.strftime("%Y%m%d-%H%M%S")), mode='w') as f:
for result in operationResult.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
print("Confidence: {}".format(result.alternatives[0].confidence))
f.write('{}\n'.format(result.alternatives[0].transcript))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'path', help='cloud storage path start with gs://')
args = parser.parse_args()
transcribe(args.path)
Specify the file path to the resource in Cloud Storage starting with gs: // as an argument, and execute the script.
python transcribe.py gs://hogehoge.wav
The result comes out as standard output and a text file.
Transcript:If you can register
Confidence: 0.8765763640403748
Transcript:I think it's better to be there
Confidence: 0.8419854640960693
20201022-010101.txt
If you can register
I think it's better to be there
-Article on how to transcribe FLAC files -Character conversion of long audio files
Recommended Posts