[Introduction to AWS] I played with male and female voices with Polly and Transcribe ♪

Last transcript Polly, I played with each one a little, but this time Has been changed so that it can be easily used for the purpose of freely using it.

What i did

・ Sequence assuming conversation app ・ Code to be realized ・ Actual result

・ Sequence assuming conversation app

The assumption is to create a situation in which a man and a woman are having a conversation by converting the text with a conversation application on the way. Here, the conversation application is natural language processing, and the input is a conversation application with text.

Therefore, the following sequence was carried out. ① Enter an appropriate sentence ② Convert the sentence of ① to female voice with polly ③ Acquire female audio file mp3; ④ Transscribe female audio file and convert it to text; conversation app input; response text ⑤ Polly the reply sentence (now the converted character sentence of ④) and convert it to male voice; ⑥ Get male audio file mp3; ⑦ Finally, transcript the male audio file and convert it into a character sentence; input the conversation application; reply sentence. .. ..

text0="Hello, it's cloudy in Tokyo and Yokohama today. Mizuki,I announced"　#①
vfile = function_polly(text0,'Mizuki') #②
file0= vfile #'Mizuki.mp3' #③
text0 = fun_tran(file0) #④
vfile = function_polly(text0,'Takumi') #⑤
file0= vfile #'Takumi.mp3' #⑥
text0 = fun_tran(file0) #⑦

・ Code to be realized

In terms of code, I just made a function to make it easier to use the ones from the last time and the last time. The Lib to be used is as follows.

from __future__ import print_function
import time
import boto3
import pandas as pd
from boto3 import Session

Define the storage buckets for session, polly, transicribe and s3 used by both functions.

session = Session(profile_name="default")
polly = session.client("polly")
transcribe = boto3.client('transcribe')
s3 = boto3.resource('s3') #Get S3 object
bucket = s3.Bucket('muauanpub') #bucket definition

The following is the function function_polly that converts text to speech. The input is text and voice0, female or male voice. The processing in the function is ① Conversion ② Save the mp3 file to the local ec2 ③ Upload to s3 bucket ⇒ Published in html etc. ④ Return the audio file name vfile in s3

def function_polly(text0,voice0):
    response = polly.synthesize_speech(Text=text0, OutputFormat="mp3",  VoiceId=voice0) #"Mizuki"
    file = open('speech.mp3', 'wb')
    file.write(response['AudioStream'].read())
    file.close()
    vfile = '{}.mp3'.format(voice0)
    bucket.upload_file('speech.mp3', vfile,  ExtraArgs={'ACL':'public-read'})
    return vfile

Next is the fun_tran function that converts an audio file to a text file. What is important here is that in order for job_name to perform unique and continuous conversion, it is necessary to generate or delete the job name one after another by inserting elements such as time. This time, the function to delete was realized by transcript.delete_transcription_job (TranscriptionJobName ='test_tran'), so I used this method. Also, as I wrote last time, it is important to define the s3 bucket name of the output destination. Object acquisition of s3; s3 = boto3.resource ('s3'). .. .. Etc. are duplicated.

Since you can see where it is stored, it is placed in each function.

def fun_tran(file0):
    job_name = "test_tran"
    job_uri = "https://muauanpub.s3.amazonaws.com/{}".format(file0)
    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': job_uri},
        MediaFormat='mp3',
        LanguageCode='ja-JP',
        OutputBucketName='muauanpub'
    )
    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        print("Not ready yet...")
        time.sleep(5)
    print(status)

    bucket.download_file('test_tran.json', 'test_tran.json') #Download to ec2; download file, file after download
    df = pd.read_json('test_tran.json') #Read json file with pandas
    text1=df['results'][1][0]['transcript']
    print(text1) #Extract conversion string from json file
    transcribe.delete_transcription_job(TranscriptionJobName='test_tran')
    return text1

・ Actual result

It's about 12 hours public, but it stopped because there are many requests and the s3 request fee seems to exceed the free frame m (_ _) m Female / male voice playback

Summary

・ I created a Polly and Transcribe function and tried to run it. ・ Both functions can be used easily, and I tried to generate male and female voices continuously. ・ I tried to rotate the same sentence, but the occurrence did not collapse and the output sentence was the same, so it can be said that the conversion accuracy is high.

・ Let's make a pseudo voice conversation app ・ Let's aim to make it move when placed on S3 with Lambda