This article is the 18th day article of CPS Lab Advent Calendar 2016.
When you read a novel, you may be worried about the continuation and want to read the continuation while driving or eating. Also, I think there are times when you want to read on the morning commute or school train, but you can't hold a book, Kindle, or smartphone on a crowded train.
You can use Amazon Polly to convert a novel that would be very easy to voice and listen to it.
However, Polly has the following Limits.
The input text size is up to 1500 billable characters (3000 characters total). SSML tags are not counted as billable characters. You can specify up to 5 lexicons to apply to the input text. The output audio stream (composite) is limited to 5 minutes. After that time, the rest of the audio will be cut off
So I thought I'd convert each episode of the novel to voice, but maybe I was caught in the limit and I didn't know the characters to be charged for 1500, so I divided it into 1000 characters and converted it, and finally one I made a program to synthesize and save to MP3.
AWS CLI Let's get ready for Polly for the time being. This time I will do it with Python, but I will do it because I need to prepare the AWS CLI. It's easy to say, and if you look here, you'll understand it, but let's explain it roughly.
Install it first. I'll do it in Python, so I can use pip.
pip install awscli
People from MaxOS El Capitan
pip install awscli --ignore-installed six
Only this. Then I think that I was able to install it, so I will set the access key and so on.
$ aws configure
AWS Access Key ID []: your access key
AWS Secret Access Key []: your secret key
Default region name [us-east-1]: us-east-1
Default output format [None]:
This is OK. The region can be any region where Polly can be used, but if you don't know, ʻus-east-1` is fine.
AWS SDK for Python (Boto3) An SDK called Boto3 is available for use with python, so use that.
pip install boto3
that's all.
Then, the sample in here is the same, but it is converted by executing the following code. The default mp3 player will launch and start talking.
"""Getting Started Example for Python 2.7+/3.3+"""
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import subprocess
from tempfile import gettempdir
# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(profile_name="adminuser")
polly = session.client("polly")
try:
# Request speech synthesis
response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3",
VoiceId="Joanna")
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
print(error)
sys.exit(-1)
# Access the audio stream from the response
if "AudioStream" in response:
# Note: Closing the stream is important as the service throttles on the
# number of parallel connections. Here we are using contextlib.closing to
# ensure the close method of the stream object will be called automatically
# at the end of the with statement's scope.
with closing(response["AudioStream"]) as stream:
output = os.path.join(gettempdir(), "speech.mp3")
try:
# Open a file for writing the output as a binary stream
with open(output, "wb") as file:
file.write(stream.read())
except IOError as error:
# Could not write to file, exit gracefully
print(error)
sys.exit(-1)
else:
# The response didn't contain audio data, exit gracefully
print("Could not stream audio")
sys.exit(-1)
# Play the audio using the platform's default player
if sys.platform == "win32":
os.startfile(output)
else:
# the following works on Mac and Linux. (Darwin = mac, xdg-open = linux).
opener = "open" if sys.platform == "darwin" else "xdg-open"
subprocess.call([opener, output])
VoiceId
There are quite a few types of voices, but since there is only one Mizuki
in Japanese, I will use this.
Once you get here, the rest is easy. First, save the Naruro novel locally. Then read the saved text file and divide it into 1000 characters each.
def split_str(s, n):
length = len(s)
return [s[i:i + n] for i in range(0, length, n)]
f = open(filename)
text = f.read()
f.close()
texts = split_str(text, 1000)
Also, let's assume that the process of throwing to Polly is finally made into a function, and stream.read ()
is returned as the return value.
Then all you have to do is write it to an MP3
out = open('output.mp3', 'wb')
for item in texts:
stream = request_speech(item)
out.write(stream)
out.close()
That's all there is to it. Well, for more details, please see Official Reference because it is generally written. It's in Japanese.
It feels like I actually asked it, but it comes to my mind. However, if you don't listen to it very intensively, you won't be able to grasp the flow at all, so I think it's probably impossible to listen to it while you're working. Also, since they all speak with the same voice and with the same tone, there is a sense of discomfort in conversation. Well, as expected. I think it would be quite different if you could choose a male or female voice before the conversation. Also, I can customize words with lexicon, but it doesn't seem to support Japanese. Dissolution!
As mentioned above, the source code is on Github. nshiba/TextToSpeechFromPolly
Recommended Posts