[Introduction to AWS] I tried playing with voice-text conversion ♪

It looks the same, but there is almost no material here. I managed to see the reference one. Last night's voice is posted. I converted this to text tonight. Voice generation

【reference】 ①Getting Started (AWS SDK for Python (Boto))Transcribe the voice with Amazon Transcribe. Create a transcription pipeline with S3 → Lambda → Transcribe → S3

From reference (1), the following code can be created. It looks almost the same as Reference ①, but looking at Reference ③ in one place, the output destination is specified as OutputBucketName ='bucket name'. Without this, I couldn't know where it was output.

from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe')
job_name = "test_tran3"
job_uri = "https://Bucket name.s3.amazonaws.com/speech.mp3"
transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': job_uri},
    MediaFormat='mp3',  #wav, mp4, mp3
    LanguageCode='ja-JP', #'en-US'
    OutputBucketName='muauanmp3'
)
while True:
    status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)

The above code gives the following output. Not ready yet is output once every 5 seconds, but it seems that it takes about 30 seconds because it is output 6 times or more. And I spit out the result json, but I don't understand much.

$ python3 boto_transcribe.py
Not ready yet...
...
Not ready yet...
{'TranscriptionJob': {'TranscriptionJobName':..., 'content-length': '506', 'connection': 'keep-alive'}, 'RetryAttempts': 0}}

So, as Reference ② does, check the file in the s3 bucket. You can see files such as test_tran3.json output with the audio file speech.mp3.

$ aws s3 ls s3://Bucket name
2020-06-19 04:52:36          2 .write_access_check_file.temp
...
2020-06-18 23:44:17      35467 speech.mp3
...
2020-06-19 04:45:47       1472 test_tran2.json
2020-06-19 04:54:09       1663 test_tran3.json

Then copy s3: // bucket name /test_tran3.json to your ec2 server.

$ aws s3 cp s3://Bucket name/test_tran3.json ./
download: s3://Bucket name/test_tran3.json to ./test_tran3.json

Finally, output the contents of json with the following command. If the language is correct, the output is correct as shown below, but the result of transcribing the same audio file with the English specification is in the alphabet as shown below, but it is strange! However, this is a tentative voice-text conversion.

$ cat test_tran3.json |jq .results[][0].transcript
"Hello also Yokohama Tokyo cloudy little voice is Mizuki's"

$ cat test_tran2.json |jq .results[][0].transcript
"Tokyo, Yokohama, Moscow See Commodities, Cueva Mitic Sundays."

However, when actually using it, I still want to do it with python code instead of handwriting. So, as a result of various investigations, I found that the following references can be used.

【reference】 ④ Upload and download files to S3 using boto3Read JSON string / file with pandas (read_json)Explanation of array nesting structure and value acquisition method in JSON using Python! If you drop these methods into your code, you get: In other words ① Download the json file ② Read with pandas ③ Output the required part That is the method.

import pandas as pd
s3 = boto3.resource('s3') #Get S3 object

bucket = s3.Bucket('Bucket name') #bucket definition
bucket.download_file('test_tran3.json', 'test_tran3.json') #Download to ec2; download file, file after download
df = pd.read_json('test_tran3.json') #Read json file with pandas

print(df['results'][1][0]['transcript']) #Extract conversion string from json file

As a result of a series of work, the following sentences were successfully obtained.

Hello also Yokohama Tokyo cloudy little voice is Mizuki's

·variation

App application

It is a single item, and it seems that minutes and translations can be used normally. In addition, when combined with last night's text-speech, you can see that the following sequence can be constructed.

text-voice-...-voice-text

So ... .. .. There are various possible processes for the part. Record the reading of the papers and materials and the sequence of questions in text. In other words, the initial text / voice and the processed voice / text may be different. Also, other sequences are possible. In the case of a conversation app, the above arrangement is reversed.

voice-Text-Conversation App-Text-voice

It is possible that. This is a sequence like Alexa. In this case, it is a text-based conversion, so it seems that you can translate normally.

Voice QA

Voice like Alexa-I think I can make a QA app. If you accept questions by voice such as a smartphone and run the above application behind it, it seems that real-time voice QA can also be done.

Twitter assistance

It's not limited to Twitter, but the point is that input can be done by voice and output can be done by voice. .. .. .. However, you need to do your best to make these apps.

Summary

・ I played with voice-text conversion ・ I was able to create a series of actions with python

-If the json file exists, it cannot be done twice, so it is necessary to delete it in a series of sequences to do it with the same job every time. ・ Let's make some application. .. .. ・ Let's do text translation

Recommended Posts

[Introduction to AWS] I tried playing with voice-text conversion ♪
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪
[Introduction to AWS] Text-Voice conversion and playing ♪
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
[Introduction to simulation] I tried playing by simulating corona infection ♬
[Introduction to infectious disease model] I tried fitting and playing ♬
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I want to play with aws with python
[Introduction to Pytorch] I played with sinGAN ♬
I tried to implement CVAE with PyTorch
I tried playing with the image with Pillow
I tried to solve TSP with QAOA
I tried to delete bad tweets regularly with AWS Lambda + Twitter API
[Introduction to AWS] I played with male and female voices with Polly and Transcribe ♪
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried fMRI data analysis with python (Introduction to brain information decoding)
I tried to make a url shortening service serverless with AWS CDK
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
I tried playing with PartiQL and MongoDB connected
I tried to get CloudWatch data with Python
I tried to output LLVM IR with Python
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried playing mahjong with Python (single mahjong edition)
I tried to predict Titanic survival with PyCaret
I tried connecting AWS Lambda with other services
I tried to operate Linux with Discord Bot
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried playing with the calculator on tkinter
[Introduction to PID] I tried to control and play ♬
I tried AWS CDK!
I tried AWS Iot
When I tried to make a VPC with AWS CDK but couldn't make it
I tried to create an environment to check regularly using Selenium with AWS Fargate
I tried to learn the sin function with chainer
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to create a table only with Django
I tried to extract features with SIFT of OpenCV
I tried to move Faster R-CNN quickly with pytorch
I tried to read and save automatically with VOICEROID2 2
I tried to implement and learn DCGAN with PyTorch
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to generate ObjectId (primary key) with pymongo
I tried to implement an artificial perceptron with python
I tried to build ML Pipeline with Cloud Composer
I tried to implement time series prediction with GBDT