Last time translates the mp3 file placed in the s3: // bucket into text with the following code and converts the json file to OutputBucketName. S3; Placed in a bucket. This time, I will call this json file and extract the text-converted sentences. I purposely issued the code last time because the code is similar this time as well.
s3 = boto3.client('s3')
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
transcribe.start_transcription_job(
TranscriptionJobName= datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '_Transcription',
LanguageCode='ja-JP',
Media={
'MediaFileUri': 'https://s3.ap-northeast-1.amazonaws.com/' + bucket + '/' + key
},
OutputBucketName='lamoutput'
)
...
raise e
So, I was able to implement it with the following code. S3; The method of saving in the bucket is as a reference. 【reference】 ① [AWS Lambda basic code 2] Save file to S3 ② Manipulate S3 objects with Boto3 (high level API and low level API) I have left a comment for Reference ①. It worked with almost the same code. The difference is that it incorporates How to handle json files the other day. First, Lib is as follows
#① Import of library
import boto3
import urllib.parse
from datetime import datetime
import json
The following defines the client by imitating from reference (2).
print('Loading function') #(2) Output the function load to the log
s3 = boto3.resource('s3') #③ Get S3 object
client = s3.meta.client
Getting the bucket and key with the lambda_handler is exactly the same as the transcript code above (of course ...).
#④ Lambda's main function
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
The following reads response ['Body'] from the json file with the same code as reference ②. However, I stumbled here. In other words, I thought that Japanese sentences would appear if this body.decode ('utf-8') was used. However, in reality, a fairly json-like (character string) appears. Initially, I didn't realize it was a string and thought it was a json file. So, I noticed that it was a character string, and I found that it can be converted to a json file with json.loads ,. .. .. I finally arrived at the code below. That is, body is a string.
response = client.get_object(Bucket=bucket, Key=key)
body = response['Body'].read()
Convert the string to a json file.
dec = json.loads(body)
And because it is a json file, Japanese sentences could be easily extracted as follows.
con_el=dec["results"]["transcripts"][0]["transcript"]
print('contents=',con_el)
Contents = Hello Tokyo Yokohama also cloudy little voice is Mizuki's
Finally, you can save it as a key-like timed .txt in the s3; bucket specified as follows.
bucket = 'muauanpub' #⑤ Specify the bucket name
key = 'test_' + datetime.now().strftime('%Y-%m-%d-%H-%M-%S') + '.txt' #⑥ Specify the key information of the object
file_contents = con_el # 'Lambda test' #⑦ File contents
obj = s3.Object(bucket,key) #⑧ Specify the bucket name and path
obj.put( Body=file_contents ) #⑨ Output file to bucket
return
・ I was able to extract sentences from the converted json file of the audio file and store it in the s3 bucket. -This is a two-step process, but when you put an mp3 file in the s3 bucket, the text-converted text itself is automatically saved in the s3 bucket. ・ For the time being, Teraterm → ec2 → s3 bucket transfer. .. .. Download from s3 bucket ⇒ Display was possible
-Also, if an application that transfers audio files to this s3 bucket and an application that displays the text file of the s3 bucket can be created, it seems that an audio file-text conversion application that is easier to use will be created (Web conversion). -Even if the conversion time is long, both Lambda functions are started asynchronously, so it seems to be a money- and time-friendly app.
Recommended Posts