I made a subtitle file (SRT) from JSON data of AmiVoice

Introduction

I will post for the first time. My name is Denki Sheep from Golden Bridge. My main business is Chinese translation, but my hobby is programming.

I've been experimenting with languages, but I'm still a Sunday programmer. I want a professional person to hit me more and more!

background

Recently, due to the influence of Corona, interpreters are in special demand online. It is often done in advance in a video recording format so that communication may become unstable. As a result, not only translation but also ** subtitles ** have been handled more and more.

Until now, it was a small amount, so I had to do it manually, but since it was a big deal, I decided to create an SRT file to save labor.

About SRT files

A de facto standard format for pasting subtitles into videos. The contents are very simple.

1
00:00:01,050 --> 00:00:05,778
First subtitle

2
00:00:07,850 --> 00:00:11,123
Next subtitle

3
00:01:02,566 --> 00:01:12,456
Third subtitle

like, index

With this file, you can import subtitles into a video file at once.

Transcription

We use AmiVoice Cloud Platform for Japanese transcription drafts.

Since it was made in Japan, it is more accurate than Google etc. in Japanese.

Writing from scratch is difficult, so [Sample Program](https://acp.amivoice.com/main/manual/%e3%82%b5%e3%83%b3%e3%83%97%e3%83%ab Just tweak% e3% 83% 97% e3% 83% ad% e3% 82% b0% e3% 83% a9% e3% 83% a0 /).

By executing this sample program, you can get the JSON data recognized by voice. For more information

[Returned JSON (AmiVoice)](https://acp.amivoice.com/main/manual/if%E4%BB%95%E6%A7%98-http%E9%9F%B3%E5%A3 % B0% E8% AA% 8D% E8% AD% 98api% E8% A9% B3% E7% B4% B0 / # response)

Key Key Key Description
results Arrangement of "recognition result of utterance section"
confidence Degree of reliability(A value between 0 and 1. 0:Low reliability, 1:High reliability)
starttime Speaking start time (voice data starts with 0)
endtime Speaking end time (voice data starts with 0)
tags Unused (empty array)
rulename Unused (empty string)
text Recognition result text
tokens Array of morphemes of recognition result text
written Notation of morpheme (word)
confidence Morpheme reliability (likelihood of recognition result)
starttime Morpheme start time (voice data starts with 0)
endtime Morpheme end time (voice data starts with 0)
spoken Morpheme reading
utteranceid Recognition result information ID*1
text Overall recognition result text that combines all of the "recognition results of the utterance section"
code One-letter code representing the result*2 List of code and message contained in JSONchecking ...
message Character string representing the error content*2 List of code and message contained in JSONchecking ...

From this JSON data, use starttime, endtime, and written in tokens to arrange it in SRT format.

Read JSON

Once you get the JSON, we will start converting immediately. As a condition to separate subtitle blocks

--Punctuation (,?) --Time (milliseconds)

I will use around. Also, subtitles don't have punctuation, so skip them.

These elements are command line arguments that I have made flexible.

import argparse
import json

parser = argparse.ArgumentParser()
parser.add_argument("file", help="Designate JSON file name to read")
parser.add_argument("-d", "--delimiters", help="Designate delimiters to separate subtitles. Default value is ['。','、']", default="。,、")
parser.add_argument("-s", "--skip", help="Designate skip words which do not inculud in subtitles. Default value is ['。','、']", default="。,、")
parser.add_argument("-t", "--time", help="Designate allowed time for single subtile by millisecongds. Default value is 5000", default=5000, type=int)
parser.add_argument("-c", "--charas", help="Designate allowed charas for single subtile. Default value is 25", default=25, type=int)

class SRTFomart():
	def __init__(self, args): 
		self.text = ""
		self.blocks = []
		self.delimiters = args.delimiters.split(",")
		self.skipWords = args.skip.split(",")
		self.time = args.time
		self.charas = args.charas

	def readFile(self, file):
		f = open(file, "r", encoding="utf-8")
		contents = f.read()
		f.close()
		data = json.loads(contents)["results"][0]
		self.text = data["text"]
		self.readTokens(data["tokens"])

	def readTokens(self, tokens):
		sub = ""
		startTime = 0
		index = 1
		# subTitles = []
		
		for token in tokens:
			written = token["written"]
			#Set startTime if subtitles are empty
			if sub == "":
				#Even if the subtitles are empty, skip if the contents of the Token are punctuation marks, etc.
				if written in self.delimiters or written in self.skipWords:
					continue

				else:
					startTime = token["starttime"]

			#Create subtitle breaks
			#Store subtitles in blocks under each condition and reset once
			#If you hit a punctuation mark
			if written in self.delimiters or len(sub) > self.charas or token["endtime"] - startTime > self.time:
				self.blocks.append(self.createSRTBlock(index, startTime, token["endtime"], sub))
				sub = ""
				startTime = 0
				index += 1

			#Connect subtitles except for conditions
			else:
				if written not in self.skipWords:
					sub += token["written"]
		
		#For loop so far
		#Store the last block
		self.blocks.append(self.createSRTBlock(index, startTime, tokens[-1]["endtime"], sub))


	def createSRTBlock(self, index, startTime, endTime, sub):
		stime = self.timeFormat(startTime)
		etime = self.timeFormat(endTime)
		return f"{index}\n{stime} --> {etime}\n{sub}\n"

	def timeFormat(self, time):
		time_ = time
		ms_ = int(time_ % 1000)
		time_ = int((time_ - ms_) / 1000)
		sec_ = int(time_ % 60)
		time_ = int((time_ - sec_) / 60)
		mn_ = int(time_ % 60)
		time_ = int((time_ - mn_) /60)
		hr_ = int(time_ % 60)
		if ms_ < 10:
			ms = f"00{ms_}"
		elif ms_ < 100:
			ms = f"0{ms_}"
		else:
			ms = str(ms_)
				
		if sec_ < 10:
			sec = f"0{sec_}"
		else:
			sec = str(sec_)
				
		if mn_ < 10:
			mn = f"0{mn_}"
		else:
			mn = str(mn_)
				
		if hr_ < 10:
			hr = f"0{hr_}"
		else:
			hr = str(hr_)
	
		return f"{hr}:{mn}:{sec},{ms}"

	def exportSRTText(self):
		return "\n".join(self.blocks)


if __name__ == "__main__":
	args = parser.parse_args()
	if not args.file.endswith(".json"):
		print("Please set json file")
	
	else:
		srt = SRTFomart(args)
		srt.readFile(args.file)
		text = srt.exportSRTText()
		srtName = args.file.replace(".json", ".srt")
		f = open(srtName, "w", encoding="utf-8")
		f.write(text)
		f.close()
		print("done")


You have successfully converted to SRT format.

import

Finally, all you have to do is adjust or translate the generated SRT file and import it into your video editing software. With Davinci Resolve, you could just drop it from the media pool onto your video track.

From the manual work so far, it seems that we can expect considerable efficiency improvement!

from now on

――I want to connect with machine translation! --Japanese proofreading with Word etc. → Looking for a way to reuptake.

Good luck with creating subtitles for the New Normal era!

Recommended Posts

I made a subtitle file (SRT) from JSON data of AmiVoice
I made a tool to generate Markdown from the exported Scrapbox JSON file
I tried reading data from a file using Node.js.
I made a configuration file with Python
I made a program to check the size of a file in Python
I made a python dictionary file for Neocomplete
I made a package to create an executable file from Hy source code
〇✕ I made a game
Make a copy of a Google Drive file from Python
Python-Read data from a numeric data file and calculate covariance
I made a threshold change box of Pepper's Dialog
I tried running python etc. from a bat file
I tried collecting data from a website with Scrapy
Python script to create a JSON file from a CSV file
I made a repeating text data generation tool "rpttxt"
I made a script in Python to convert a text file for JSON (for vscode user snippet)
I want to start a lot of processes from python
I made a tool to create a word cloud from wikipedia
I made a function to check the model of DCGAN
I made a dot picture of the image of Irasutoya. (part1)
Python> Read from a multi-line string instead of a file> io.StringIO ()
I made a dot picture of the image of Irasutoya. (part2)
I made you to execute a command from a web browser
Create a dummy data file
I made a python text
I made a discord bot
I made a Line bot that guesses the gender and age of a person from an image
What skills should I study as a data analyst from inexperienced?
Python> I made a test code for my own external file
I made a lot of files for RDP connection with Python
Impressions of touching Dash, a data visualization tool made by python
I made a slack bot that notifies me of the temperature
I made a library that adds docstring to a Python stub file.
DataFrame of pandas From creating a DataFrame from two lists to writing a file
[python] I made a class that can write a file tree quickly
[Kaggle] I made a collection of questions using the Titanic tutorial
I made a C ++ learning site
Can I be a data scientist?
I made a Line-bot using Python!
I made a CUI-based translation script (2)
I made a wikipedia gacha bot
Extract specific data from complex JSON
I made a fortune with Python.
I made a CUI-based translation script
I made a daemon with Python
I made a plugin to generate Markdown table from csv in Vim
[Updated Ver1.3.1] I made a data preprocessing library DataLiner for machine learning.
I made a note of Google colaboratory which can use Spleeter easily.
I made a kind of simple image processing tool in Go language.
A story of a person who started aiming for data scientist from a beginner
I made a simple timer that can be started from the terminal
I made a GAN with Keras, so I made a video of the learning process.
I tried to perform a cluster analysis of customers using purchasing data
I made a mistake in fetching the hierarchy with MultiIndex of pandas
I made a webAPI! Build environment from Django Rest Framework 1 on EC2
I made a function to see the movement of a two-dimensional array (Python)
I created a script to check if English is entered in the specified position of the JSON file in Python.
I made a program in Python that reads CSV data of FX and creates a large amount of chart images