I made a subtitle file (SRT) from JSON data of AmiVoice

Introduction

I will post for the first time. My name is Denki Sheep from Golden Bridge. My main business is Chinese translation, but my hobby is programming.

I've been experimenting with languages, but I'm still a Sunday programmer. I want a professional person to hit me more and more!

background

Recently, due to the influence of Corona, interpreters are in special demand online. It is often done in advance in a video recording format so that communication may become unstable. As a result, not only translation but also ** subtitles ** have been handled more and more.

Until now, it was a small amount, so I had to do it manually, but since it was a big deal, I decided to create an SRT file to save labor.

About SRT files

A de facto standard format for pasting subtitles into videos. The contents are very simple.

1
00:00:01,050 --> 00:00:05,778
First subtitle

2
00:00:07,850 --> 00:00:11,123
Next subtitle

3
00:01:02,566 --> 00:01:12,456
Third subtitle

like, index

Start time-> End time * Text (new line) The four elements of are made side by side.

With this file, you can import subtitles into a video file at once.

Transcription

We use AmiVoice Cloud Platform for Japanese transcription drafts.

Since it was made in Japan, it is more accurate than Google etc. in Japanese.

Writing from scratch is difficult, so [Sample Program](https://acp.amivoice.com/main/manual/%e3%82%b5%e3%83%b3%e3%83%97%e3%83%ab Just tweak% e3% 83% 97% e3% 83% ad% e3% 82% b0% e3% 83% a9% e3% 83% a0 /).

By executing this sample program, you can get the JSON data recognized by voice. For more information

[Returned JSON (AmiVoice)](https://acp.amivoice.com/main/manual/if%E4%BB%95%E6%A7%98-http%E9%9F%B3%E5%A3 % B0% E8% AA% 8D% E8% AD% 98api% E8% A9% B3% E7% B4% B0 / # response)

Key	Key	Key	Description
results			Arrangement of "recognition result of utterance section"
	confidence		Degree of reliability(A value between 0 and 1. 0:Low reliability, 1:High reliability)
	starttime		Speaking start time (voice data starts with 0)
	endtime		Speaking end time (voice data starts with 0)
	tags		Unused (empty array)
	rulename		Unused (empty string)
	text		Recognition result text
	tokens		Array of morphemes of recognition result text
		written	Notation of morpheme (word)
		confidence	Morpheme reliability (likelihood of recognition result)
		starttime	Morpheme start time (voice data starts with 0)
		endtime	Morpheme end time (voice data starts with 0)
		spoken	Morpheme reading
utteranceid			Recognition result information ID*1
text			Overall recognition result text that combines all of the "recognition results of the utterance section"
code			One-letter code representing the result*2 List of code and message contained in JSONchecking ...
message			Character string representing the error content*2 List of code and message contained in JSONchecking ...

From this JSON data, use starttime, endtime, and written in tokens to arrange it in SRT format.

Read JSON

Once you get the JSON, we will start converting immediately. As a condition to separate subtitle blocks

--Punctuation (,?) --Time (milliseconds)

word count

I will use around. Also, subtitles don't have punctuation, so skip them.

These elements are command line arguments that I have made flexible.

import argparse
import json

parser = argparse.ArgumentParser()
parser.add_argument("file", help="Designate JSON file name to read")
parser.add_argument("-d", "--delimiters", help="Designate delimiters to separate subtitles. Default value is ['。','、']", default="。,、")
parser.add_argument("-s", "--skip", help="Designate skip words which do not inculud in subtitles. Default value is ['。','、']", default="。,、")
parser.add_argument("-t", "--time", help="Designate allowed time for single subtile by millisecongds. Default value is 5000", default=5000, type=int)
parser.add_argument("-c", "--charas", help="Designate allowed charas for single subtile. Default value is 25", default=25, type=int)

class SRTFomart():
	def __init__(self, args): 
		self.text = ""
		self.blocks = []
		self.delimiters = args.delimiters.split(",")
		self.skipWords = args.skip.split(",")
		self.time = args.time
		self.charas = args.charas

	def readFile(self, file):
		f = open(file, "r", encoding="utf-8")
		contents = f.read()
		f.close()
		data = json.loads(contents)["results"][0]
		self.text = data["text"]
		self.readTokens(data["tokens"])

	def readTokens(self, tokens):
		sub = ""
		startTime = 0
		index = 1
		# subTitles = []
		
		for token in tokens:
			written = token["written"]
			#Set startTime if subtitles are empty
			if sub == "":
				#Even if the subtitles are empty, skip if the contents of the Token are punctuation marks, etc.
				if written in self.delimiters or written in self.skipWords:
					continue

				else:
					startTime = token["starttime"]

			#Create subtitle breaks
			#Store subtitles in blocks under each condition and reset once
			#If you hit a punctuation mark
			if written in self.delimiters or len(sub) > self.charas or token["endtime"] - startTime > self.time:
				self.blocks.append(self.createSRTBlock(index, startTime, token["endtime"], sub))
				sub = ""
				startTime = 0
				index += 1

			#Connect subtitles except for conditions
			else:
				if written not in self.skipWords:
					sub += token["written"]
		
		#For loop so far
		#Store the last block
		self.blocks.append(self.createSRTBlock(index, startTime, tokens[-1]["endtime"], sub))


	def createSRTBlock(self, index, startTime, endTime, sub):
		stime = self.timeFormat(startTime)
		etime = self.timeFormat(endTime)
		return f"{index}\n{stime} --> {etime}\n{sub}\n"

	def timeFormat(self, time):
		time_ = time
		ms_ = int(time_ % 1000)
		time_ = int((time_ - ms_) / 1000)
		sec_ = int(time_ % 60)
		time_ = int((time_ - sec_) / 60)
		mn_ = int(time_ % 60)
		time_ = int((time_ - mn_) /60)
		hr_ = int(time_ % 60)
		if ms_ < 10:
			ms = f"00{ms_}"
		elif ms_ < 100:
			ms = f"0{ms_}"
		else:
			ms = str(ms_)
				
		if sec_ < 10:
			sec = f"0{sec_}"
		else:
			sec = str(sec_)
				
		if mn_ < 10:
			mn = f"0{mn_}"
		else:
			mn = str(mn_)
				
		if hr_ < 10:
			hr = f"0{hr_}"
		else:
			hr = str(hr_)
	
		return f"{hr}:{mn}:{sec},{ms}"

	def exportSRTText(self):
		return "\n".join(self.blocks)


if __name__ == "__main__":
	args = parser.parse_args()
	if not args.file.endswith(".json"):
		print("Please set json file")
	
	else:
		srt = SRTFomart(args)
		srt.readFile(args.file)
		text = srt.exportSRTText()
		srtName = args.file.replace(".json", ".srt")
		f = open(srtName, "w", encoding="utf-8")
		f.write(text)
		f.close()
		print("done")

You have successfully converted to SRT format.

import

Finally, all you have to do is adjust or translate the generated SRT file and import it into your video editing software. With Davinci Resolve, you could just drop it from the media pool onto your video track.

From the manual work so far, it seems that we can expect considerable efficiency improvement!

from now on

――I want to connect with machine translation! --Japanese proofreading with Word etc. → Looking for a way to reuptake.

Good luck with creating subtitles for the New Normal era!