I bought a DVD box of "Mayday !: The Truth and Truth of Aircraft Accidents" (English title Air Crash Investigation). It's an English version, but I thought there would be subtitles, but I didn't even have English subtitles ...
Fortunately, access control by CSS (Content Scramble System) was not applied, so I tried to make subtitles somehow.
For macOS, you can install it using homebrew. It is also a prerequisite that you can use Python 3 and pip.
brew install ffmpeg mkvtoolnix
pip3 install boto3
You can easily copy to a file containing only audio data using ffmpeg.
ffmpeg -i original.m4v -acodec copy -vn output.m4a
To use Amazon Transcribe, you need to upload audio data to S3. This time, for simplicity, I wrote a simple script in Python that just uploads to S3 and submits a job to Amazon Transcribe.
The code that comes out after this is a code that I wrote in a little over 10 minutes, so it's pretty rough overall ...
01-transcribe.py
from boto3 import client, resource
import os
import sys
AWS_ACCESS_KEY = "hogehoge"
AWS_SECRET_ACCESS_KEY = "fugafuga"
BUCKET = "somebucket"
def upload(filepath):
basename = os.path.basename(filepath)
s3_client = resource(
"s3",
region_name="ap-northeast-1",
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
s3_client.Bucket(BUCKET).upload_file(filepath, basename)
def transcribe(filename):
url = "s3://{}/{}".format(BUCKET, filename)
transcribe_client = client(
"transcribe",
region_name="ap-northeast-1",
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
response = transcribe_client.start_transcription_job(
TranscriptionJobName=filename,
LanguageCode="en-US",
MediaFormat="mp4",
Media={
"MediaFileUri": url
},
OutputBucketName=BUCKET,
)
def main():
filepath = sys.argv[1]
upload(filepath)
transcribe(os.path.basename(filepath))
if __name__ == "__main__":
main()
Pass the above file as an argument and execute it.
python 01-transcribe.py output.m4a
Download the resulting JSON file from the web console. (Omission)
The resulting JSON contains an array of recognized words and their start and end times. If you display each word as subtitles as it is, it will be too difficult to read.
Therefore, I decided to determine the display range according to certain rules and generate subtitle data (srt).
This rule is decided appropriately, so please play with it as you like.
02-makesrt.py
import json
import sys
def sec2time(sec):
h = int(sec/3600)
m = int((sec%3600) / 60)
s = int(sec % 60)
mils = int((sec%1)*1000)
return "{:02d}:{:02d}:{:02d},{:03d}".format(h, m, s, mils)
def convert2srt(filepath):
with open(filepath, "r") as f:
data = json.load(f)
start_time = 0
end_time = 0
s = ""
index = 0
for item in data["results"]["items"]:
is_output = False
if "start_time" in item:
item["start_time"] = float(item["start_time"])
item["end_time"] = float(item["end_time"])
if item["start_time"] - end_time > 3:
#Did you have time
is_output = True
elif len(s) >= 110:
#If it's getting longer
is_output = True
if s != "":
if len(s)>1 and s[-2].isupper():
pass
else:
last = s[-1]
if last in (".", "?", "!"):
is_output = True
if last == "," and len(s) > 80:
is_output = True
if is_output:
end_time = min(item["start_time"], end_time+2.0)
if s != "":
print(index)
index += 1
print("{0} --> {1}".format(sec2time(start_time), sec2time(end_time)))
print(s)
print("")
start_time = 0
end_time = 0
s = ""
if "start_time" in item:
if start_time == 0:
start_time = item["start_time"]
end_time = item["end_time"]
if s and (len(item["alternatives"][0]["content"])>1 or s[-1] != "."):
s += " " + item["alternatives"][0]["content"]
else:
s += item["alternatives"][0]["content"]
else:
s += item["alternatives"][0]["content"]
if s != "":
print(index)
index += 1
print("{0} --> {1}".format(sec2time(start_time), sec2time(end_time+2.0)))
print(s)
print("")
def main():
filepath = sys.argv[1]
convert2srt(filepath)
if __name__ == "__main__":
main()
Specify a JSON file as an argument and save the result in a text file by redirect.
python 02-makesrt.py result.json > result.srt
The output should look like this
output
64
00:06:00,139 --> 00:06:16,839
Then the next Nano second, it was pure, unadulterated pandemonium Way number three going down.
65
00:06:16,839 --> 00:06:18,720
It looks like we lost number three engine.
66
00:06:18,720 --> 00:06:23,149
We're descending rapidly coming back.
With mkvmerge, you can easily embed subtitle data in mkv files.
mkvmerge -o output.mkv original.m4v --language 0:eng --track-name 0:English result.srt
The embedded subtitles can be displayed when playing back on VLC.
Well, I think that it is displayed almost without any discomfort.
The real thrill of programming is that you can quickly create tools at such times.
Recommended Posts