Trigger

I bought a DVD box of "Mayday !: The Truth and Truth of Aircraft Accidents" (English title Air Crash Investigation). It's an English version, but I thought there would be subtitles, but I didn't even have English subtitles ...

Fortunately, access control by CSS (Content Scramble System) was not applied, so I tried to make subtitles somehow.

What i did

Extract only audio from video data (ffmpeg)
Transcribe from voice data (Amazon Transcribe)
Convert the transcription result to SubRip subtitle data (Python)
Embed subtitle data in video data (mkvmerge)

Installation of necessary tools

For macOS, you can install it using homebrew. It is also a prerequisite that you can use Python 3 and pip.

brew install ffmpeg mkvtoolnix
pip3 install boto3

1. Extract only audio from video data

You can easily copy to a file containing only audio data using ffmpeg.

ffmpeg -i original.m4v -acodec copy -vn output.m4a

2. Transcription from voice data

To use Amazon Transcribe, you need to upload audio data to S3. This time, for simplicity, I wrote a simple script in Python that just uploads to S3 and submits a job to Amazon Transcribe.

The code that comes out after this is a code that I wrote in a little over 10 minutes, so it's pretty rough overall ...

`01-transcribe.py`


from boto3 import client, resource
import os
import sys

AWS_ACCESS_KEY = "hogehoge"
AWS_SECRET_ACCESS_KEY = "fugafuga"
BUCKET = "somebucket"

def upload(filepath):
    basename = os.path.basename(filepath)

    s3_client = resource(
        "s3",
        region_name="ap-northeast-1",
        aws_access_key_id=AWS_ACCESS_KEY,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    )
    s3_client.Bucket(BUCKET).upload_file(filepath, basename)


def transcribe(filename):

    url = "s3://{}/{}".format(BUCKET, filename)

    transcribe_client = client(
            "transcribe",
            region_name="ap-northeast-1",
            aws_access_key_id=AWS_ACCESS_KEY,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    )
    response = transcribe_client.start_transcription_job(
            TranscriptionJobName=filename,
            LanguageCode="en-US",
            MediaFormat="mp4",
            Media={
                    "MediaFileUri": url
            },
            OutputBucketName=BUCKET,
    )

def main():
    filepath = sys.argv[1]
    upload(filepath)
    transcribe(os.path.basename(filepath))


if __name__ == "__main__":
    main()

Pass the above file as an argument and execute it.

python 01-transcribe.py output.m4a

Download the resulting JSON file from the web console. (Omission)

3. Convert the transcription result to SubRip subtitle data

The resulting JSON contains an array of recognized words and their start and end times. If you display each word as subtitles as it is, it will be too difficult to read.

Therefore, I decided to determine the display range according to certain rules and generate subtitle data (srt).

Cut off after 3 seconds or more from the previous word
Period ,! When ,? Comes, judge it as the end of the sentence and cut it
Cut when it gets longer
When a comma comes, cut it short
Set the display end time to the start time of the next subtitle or 2 seconds later, whichever comes first.

This rule is decided appropriately, so please play with it as you like.

`02-makesrt.py`


import json
import sys


def sec2time(sec):
    h = int(sec/3600)
    m = int((sec%3600) / 60)
    s = int(sec % 60)
    mils = int((sec%1)*1000)
    return "{:02d}:{:02d}:{:02d},{:03d}".format(h, m, s, mils)

def convert2srt(filepath):
    with open(filepath, "r") as f:
        data = json.load(f)

    start_time = 0
    end_time = 0
    s = ""
    index = 0
    for item in data["results"]["items"]:
        is_output = False

        if "start_time" in item:
            item["start_time"] = float(item["start_time"])
            item["end_time"] = float(item["end_time"])

            if item["start_time"] - end_time > 3:
                #Did you have time
                is_output = True

            elif len(s) >= 110:
                #If it's getting longer
                is_output = True
            
            if s != "":
                if len(s)>1 and s[-2].isupper():
                    pass
                else:
                    last = s[-1]
                    if last in (".", "?", "!"):
                        is_output = True
                    
                    if last == "," and len(s) > 80:
                        is_output = True

        if is_output:

            end_time = min(item["start_time"], end_time+2.0)

            if s != "":
                print(index)
                index += 1
                print("{0} --> {1}".format(sec2time(start_time), sec2time(end_time)))
                print(s)
                print("")

            start_time = 0
            end_time = 0
            s = ""

        if "start_time" in item:
            if start_time == 0:
                start_time = item["start_time"]
            end_time = item["end_time"]
            if s and (len(item["alternatives"][0]["content"])>1 or s[-1] != "."):
                s += " " + item["alternatives"][0]["content"]
            else:
                s += item["alternatives"][0]["content"]
        else:
            s += item["alternatives"][0]["content"]

    if s != "":
        print(index)
        index += 1
        print("{0} --> {1}".format(sec2time(start_time), sec2time(end_time+2.0)))
        print(s)
        print("")

def main():
    filepath = sys.argv[1]
    convert2srt(filepath)

if __name__ == "__main__":
    main()

Specify a JSON file as an argument and save the result in a text file by redirect.

python 02-makesrt.py result.json > result.srt

The output should look like this

`output`


64
00:06:00,139 --> 00:06:16,839
Then the next Nano second, it was pure, unadulterated pandemonium Way number three going down.

65
00:06:16,839 --> 00:06:18,720
It looks like we lost number three engine.

66
00:06:18,720 --> 00:06:23,149
We're descending rapidly coming back.

4. Embed subtitle data in video data

With mkvmerge, you can easily embed subtitle data in mkv files.

mkvmerge -o output.mkv original.m4v --language 0:eng --track-name 0:English result.srt

The embedded subtitles can be displayed when playing back on VLC.

result

Well, I think that it is displayed almost without any discomfort.

The real thrill of programming is that you can quickly create tools at such times.

Subtitle data created with Amazon Transcribe