This is a record I tried as a reference for memorandum. I think the reference article is easy for anyone who knows it is neatly written, but it was difficult for AWS beginners. I would like to explain that area in a muddy manner. So what I did is finally the reference article. 【reference】 ・ S3 → Lambda → Transcribe → Create a transcription pipeline with S3
・ Create a bucket for (input) in 1 S3 ・ 2 Open Lambda ・ 3 Definition and meaning of Lambda function ・ 4 How to read CloudWatch Logs ・ Create a bucket for (output) in 5 S3 ・ 6 Lambda correction ・ 7 How to change the execution role ・ 8 Edit Lambda function ・ 9 How to check Transcription
If you press the service on the upper left of AWS, all services will be displayed, and you can select various menus from here. Among them, if you select s3 of storage, you can jump to the page where you can create a bucket of s3, so create it there. I think it works even if all security is prohibited (objects are made public). Create input and output with appropriate bucket names.
View all services as above, this time open Lambda for computing. Function creation page opens If it doesn't open, click Create Function. Then, the page of the picture of reference ① will jump. Here, you can go to the next screen by selecting [Use Blueprint]-[s3-get-object-python]-[Settings].
In the first place, the main premise is that the ** Lambda function is a serverless definition of a function that picks up some Trigger and operates, and it is a pay-as-you-go service that costs a fee for the complete operation. ** ** So we define a function with almost a single function.
Finally, let's define the behavior of the function. Function name; Anything that is unique looks good Role name; this must be unique. Even if you delete the function, it will not disappear, so you need to delete it separately. S3 trigger; input bucket name for input Enable trigger; check The following skeleton is spit out, but it turns out that this function is already working. That is, it can be seen that a log is recorded when something is placed (transferred) in the input bucket. Looking at the contents, it is as follows First, Lib is as follows
import json
import urllib.parse
import boto3
Get s3 object
print('Loading function')
s3 = boto3.client('s3')
The lambda_handler function performs the operation described in #. That is, it gets an object from event and shows its contents. bucket returns the s3 bucket name defined above. key returns the filename. Get the file name etc. placed in the bucket in response. It is returning that content type. Below Exception is the error routine.
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
response = s3.get_object(Bucket=bucket, Key=key)
print("CONTENT TYPE: " + response['ContentType'])
return response['ContentType']
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
When you create a function, you will have a function page. When you enter there, it is [Settings] [Access privileges] [Monitoring], and the code is written below. Select the [Monitoring]. And there are some graphs, below which you can see CloudWatch Logs Insights. Here, try arranging (transferring) something from ec2 etc. to the above input bucket. Then, the above Lambda function works, and this CloudWatch Logs shows the record of the movement every moment. Errors are also thrown. So, at least you can check that it is working.
Finally, make the Lambda function real. This time, I think you can just copy and paste the application in Reference ① and change the bucket names of input and output. Let's see what happens when the trigger actually comes in the try. Here, the most devised is that the TranscriptionJobName is generated for each time when the trigger is applied to keep it unique. Therefore, the generated json also knows the time. You can understand the following code because bucket is an input bucket and the file name is key. Then, the output file is output to the bucket defined by OutputBucketName.
transcribe.start_transcription_job(
TranscriptionJobName= datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '_Transcription',
LanguageCode='ja-JP',
Media={
'MediaFileUri': 'https://s3.ap-northeast-1.amazonaws.com/' + bucket + '/' + key
},
OutputBucketName='lamoutput'
)
Select [Access Rights]. The execution role appears. Clicking on it will show the permissions of the execution role and the Permissions Policy. here, AmazonS3FullAccess AmazonTranscribeFullAccess To add. To add it, click [Attach Policy] and enter the above in the search, so check the left and attach it to add it.
Same as 6, but modify the code below directly on the function page and save.
In this state, if you place some mp3 file on the input, I think that the result of the Transcribe is the json file in the output. I think it will take some time, but it output faster than my own program. You can see the current situation by looking at the CloudWatch Logs above. By the way, in the output bucket, you may not be able to see it unless you reload it, so be careful. And if you publish the json file, you can easily download it. Also, when I simply opened it, the characters were garbled. When I opened it with Notepad, I could see the contents beautifully. Furthermore, I think it was beautiful to read the downloaded json file with pandas and check the Transcribed text. I thought about using the above function so far, but I gave up tonight because it seems to be difficult.
・ Lambda function debut
・ Since the conversion is fast, I would like to implement Polly and apply it to conversation apps. ・ Polly will try to make it in the same way
Recommended Posts