[Introduction to AWS] I played with male and female voices with Polly and Transcribe ♪

Last transcript Polly, I played with each one a little, but this time Has been changed so that it can be easily used for the purpose of freely using it.

What i did

・ Sequence assuming conversation app ・ Code to be realized ・ Actual result

・ Sequence assuming conversation app

The assumption is to create a situation in which a man and a woman are having a conversation by converting the text with a conversation application on the way. Here, the conversation application is natural language processing, and the input is a conversation application with text.

Therefore, the following sequence was carried out. ① Enter an appropriate sentence ② Convert the sentence of ① to female voice with polly ③ Acquire female audio file mp3; ④ Transscribe female audio file and convert it to text; conversation app input; response text ⑤ Polly the reply sentence (now the converted character sentence of ④) and convert it to male voice; ⑥ Get male audio file mp3; ⑦ Finally, transcript the male audio file and convert it into a character sentence; input the conversation application; reply sentence. .. ..

text0="Hello, it's cloudy in Tokyo and Yokohama today. Mizuki,I announced" #①
vfile = function_polly(text0,'Mizuki') #②
file0= vfile #'Mizuki.mp3' #③
text0 = fun_tran(file0) #④
vfile = function_polly(text0,'Takumi') #⑤
file0= vfile #'Takumi.mp3' #⑥
text0 = fun_tran(file0) #⑦

・ Code to be realized

In terms of code, I just made a function to make it easier to use the ones from the last time and the last time. The Lib to be used is as follows.

from __future__ import print_function
import time
import boto3
import pandas as pd
from boto3 import Session

Define the storage buckets for session, polly, transicribe and s3 used by both functions.

session = Session(profile_name="default")
polly = session.client("polly")
transcribe = boto3.client('transcribe')
s3 = boto3.resource('s3') #Get S3 object
bucket = s3.Bucket('muauanpub') #bucket definition

The following is the function function_polly that converts text to speech. The input is text and voice0, female or male voice. The processing in the function is ① Conversion ② Save the mp3 file to the local ec2 ③ Upload to s3 bucket ⇒ Published in html etc. ④ Return the audio file name vfile in s3

def function_polly(text0,voice0):
    response = polly.synthesize_speech(Text=text0, OutputFormat="mp3",  VoiceId=voice0) #"Mizuki"
    file = open('speech.mp3', 'wb')
    file.write(response['AudioStream'].read())
    file.close()
    vfile = '{}.mp3'.format(voice0)
    bucket.upload_file('speech.mp3', vfile,  ExtraArgs={'ACL':'public-read'})
    return vfile

Next is the fun_tran function that converts an audio file to a text file. What is important here is that in order for job_name to perform unique and continuous conversion, it is necessary to generate or delete the job name one after another by inserting elements such as time. This time, the function to delete was realized by transcript.delete_transcription_job (TranscriptionJobName ='test_tran'), so I used this method. Also, as I wrote last time, it is important to define the s3 bucket name of the output destination. Object acquisition of s3; s3 = boto3.resource ('s3'). .. .. Etc. are duplicated.

def fun_tran(file0):
    job_name = "test_tran"
    job_uri = "https://muauanpub.s3.amazonaws.com/{}".format(file0)
    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': job_uri},
        MediaFormat='mp3',
        LanguageCode='ja-JP',
        OutputBucketName='muauanpub'
    )
    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        print("Not ready yet...")
        time.sleep(5)
    print(status)

    bucket.download_file('test_tran.json', 'test_tran.json') #Download to ec2; download file, file after download
    df = pd.read_json('test_tran.json') #Read json file with pandas
    text1=df['results'][1][0]['transcript']
    print(text1) #Extract conversion string from json file
    transcribe.delete_transcription_job(TranscriptionJobName='test_tran')
    return text1

・ Actual result

It's about 12 hours public, but it stopped because there are many requests and the s3 request fee seems to exceed the free frame m (_ _) m Female / male voice playback

Summary

・ I created a Polly and Transcribe function and tried to run it. ・ Both functions can be used easily, and I tried to generate male and female voices continuously. ・ I tried to rotate the same sentence, but the occurrence did not collapse and the output sentence was the same, so it can be said that the conversion accuracy is high.

・ Let's make a pseudo voice conversation app ・ Let's aim to make it move when placed on S3 with Lambda

Recommended Posts

[Introduction to AWS] I played with male and female voices with Polly and Transcribe ♪
[Introduction to Pytorch] I played with sinGAN ♬
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪
[Introduction to AWS] I tried playing with voice-text conversion ♪
[Introduction to system trading] I drew a Stochastic Oscillator with python and played with it ♬
I played with PyQt5 and Python3
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
[Introduction to sinGAN-Tensorflow] I played with the super-resolution "Challenge Big Imayuyu" ♬
[Introduction to Matplotlib] Axes 3D animation: I played with 3D Lissajous figures ♬
[Introduction to RasPi4] I played with "Hiroko / Hiromi's poisonous tongue conversation" ♪
[Introduction to StyleGAN] I played with "A woman transforms into Mayuyu" ♬
I want to play with aws with python
[Introduction to AWS] Text-Voice conversion and playing ♪
[Introduction to StyleGAN] I played with style_mixing "Woman who takes off glasses" ♬
[Python] I introduced Word2Vec and played with it.
[Introduction to AWS] The first Lambda is Transcribe ♪
[Introduction to PID] I tried to control and play ♬
I tried to read and save automatically with VOICEROID2 2
I tried to implement and learn DCGAN with PyTorch
I want to handle optimization with python and cplex
I tried to automatically read and save with VOICEROID2
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I want to AWS Lambda with Python on Mac!
I tried to implement Grad-CAM with keras and tensorflow
I played with wordcloud!
I tried to predict and submit Titanic survivors with Kaggle
[Introduction to pytorch] Preprocessing by audio I / O and torch audio (> <;)
[Introduction to infectious disease model] I tried fitting and playing ♬
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
I tried to make GUI tic-tac-toe with Python and Tkinter
[Introduction to Mac] Convenient Mac apps and settings that I use
Introduction to RDB with sqlalchemy Ⅰ
Introduction to Nonlinear Optimization (I)
I tried to visualize bookmarks flying to Slack with Doc2Vec and PCA
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
[Introduction to Python] I compared the naming conventions of C # and Python.
I tried to make a periodical process with Selenium and Python
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I tried to create Bulls and Cows with a shell program
Prepare an environment to use OpenCV and Pillow with AWS Lambda
I tried to easily detect facial landmarks with python and dlib
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 2