I tried to speed up video creation by parallel processing

slow! !! !! I really thought

I tried to extract characters from subtitles (OpenCV: tesseract-ocr edition) As a result of image processing with video, 810 seconds (about 13.5 minutes) I've lost it. Here, I tried to improve the speed of making videos by refactoring.

Method 1 PIL <-> OpenCV

When I tried to transfer the same image data from PIL to OpenCV and process it, I temporarily saved it in a file and then read it, but upon closer examination, it seems that it can be done simply by converting the type of numpy. with this 810 seconds (about 13.5 minutes)-> 450 seconds (about 7.5 minutes) I was able to reduce it.

Before correction

def createTextImage(src, sentence, px, py, color=(8,8,8), fsize=28):

    #Save image
    tmp_path = "src_temp.png "
    cv2.imwrite(tmp_path, src)

    #Stored in PIL object
    img = Image.open(tmp_path)
    draw = ImageDraw.Draw(img)

    #Write text on the image with PIL
    font = ImageFont.truetype("./IPAfont00303/ipag.ttf", fsize)
    draw.text((px, py), sentence, fill=color, font=font)
    img.save(tmp_path)

    #Store in openCV
    return cv2.imread(tmp_path)

Revised

opencv(BGR) -> PIL(RGB)

cvimg = cv2.imread("sample.png ")
rgbImg = cv2.cvtColor(cvimg, cv2.COLOR_BGR2RGB)
pilImg = Image.fromarray(rgbImg)

------------------

PIL(RGB) -> opencv(BGR)
cvImg = np.array(pilImg, dtype=np.uint8)
dst = cv2.cvtColor(cvImg, cv2.COLOR_RGB2BGR)

Method 2 Parallel processing

I used to process the reading, processing, and writing of the video for each frame, but I wrote a script in the following flow to parallelize only the processing.

--Temporarily store frames in an array --Process and aggregate subtitle extraction for each frame in parallel --Sort by id --Save to video

Parallel processing uses joblib. In the callback source, you can write in almost one line by making full use of inclusion processing. n_jobs = 16 is the number of processes.


from joblib import Parallel, delayed

def main_image_process(src, tool):
    
    #Let's do some image processing here
        #Preprocessing
        gray_frame = pre_process(src.content)
        #Character extraction
        #Subtitle creation

...
Parallel(n_jobs=16)( [delayed(main_image_process)(f, tool) for f in frames] )
...

development of

import sys

import cv2
import io
import os
import numpy as np

import pyocr
import pyocr.builders

from PIL import Image, ImageDraw, ImageFont

from collections import namedtuple
from joblib import Parallel, delayed

import time

MovieFrame = namedtuple("MovieFrame", ["id", "content", "timestamp"])

telop_height = 50
cap_width = 1
cap_height = 1


def pre_process(src):
	kernel = np.ones((3,3),np.uint8)
	gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
        
    #Binarization
	o_ret, o_dst = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
    #Opening reduction->Expansion
	dst = cv2.morphologyEx(o_dst, cv2.MORPH_OPEN, kernel)
    #Invert
	dst = cv2.bitwise_not(dst)
    # channel 1 ->Convert to 3
	dst = cv2.cvtColor(dst, cv2.COLOR_GRAY2BGR)
	return dst

#Character extraction
def extractTelopText(src, tool):

	rgbImg = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)
	dst = tool.image_to_string(
		Image.fromarray(rgbImg),
		lang='jpn',
		builder=pyocr.builders.WordBoxBuilder(tesseract_layout=6)
	)

	sentence = []
	for item in dst:
		sentence.append(item.content)

	return "".join(sentence)

#Create empty subtitles
def createFooterTelop(src):

	telop = np.zeros((telop_height, cap_width, 3), np.uint8)
	telop[:] = tuple((128,128,128))

	images = [src, telop]
	dst = np.concatenate(images, axis=0)
	return dst

#Parallel processing
def main_image_process(src, tool):

	#Processed to make character recognition easier
	gray_frame = pre_process(src.content)

	#Trim only where telop is likely to appear
	roi = gray_frame[435:600, :]

	#Extract text
	text = extractTelopText(roi, tool)
        
	#Subtitle creation
	dst = createFooterTelop(src.content)

	#Add text to the image
	dst = addJapaneseTelop(dst, text, 20, cap_height + telop_height - 30)
	dst = addASCIITelop(dst, str(src.timestamp) + "[sec]", cap_width - 250, cap_height + telop_height - 10, color=(0,255,0))

        #Store in nametuble
	return MovieFrame(src.id, dst, src.timestamp)

#Add characters(Alphanumeric characters only)
def addASCIITelop(src, sentence, px, py, color=(8,8,8), fsize=28):

	cv2.putText(src, sentence, 
						(px, py), 
						cv2.FONT_HERSHEY_SIMPLEX, 
						1, 
						color, 
						2, 
						cv2.LINE_AA)	
	return src

#Add characters(Japanese)
def addJapaneseTelop(src, sentence, px, py, color=(8,8,8), fsize=28):

	rgbImg = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)

	#openCV -> PIL
	canvas = Image.fromarray(rgbImg).copy()

	draw = ImageDraw.Draw(canvas)
	font = ImageFont.truetype("./IPAfont00303/ipag.ttf", fsize)

	#Add characters
	draw.text((px, py), sentence, fill=color, font=font)

	#PIL -> openCV	
	dst = cv2.cvtColor(np.array(canvas, dtype=np.uint8), cv2.COLOR_RGB2BGR)

	return dst

if __name__ == '__main__':

	tools = pyocr.get_available_tools()
	if len(tools) == 0:
		print("No OCR tool found")
		sys.exit(1)

	tool = tools[0]

	cap = cv2.VideoCapture('one_minutes.mp4')

	cap_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
	cap_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
	fps = cap.get(cv2.CAP_PROP_FPS)

	telop_height = 50

	fourcc = cv2.VideoWriter_fourcc('m','p','4','v')
	writer = cv2.VideoWriter('extract_telop_async.mp4',fourcc, fps, (cap_width, cap_height + telop_height))

	frames = []

	start = time.time()
	idx = 0

	#Video loading
	try :
		while True:
			if not cap.isOpened():
				break

			if cv2.waitKey(1) & 0xFF == ord('q'):
				break

			ret, frame = cap.read()

			if frame is None:
				break

			frames.append(MovieFrame(idx,frame, round(idx/fps, 4)) )
			idx += 1

	except cv2.error as e:
		print(e)

	cap.release()
	print("read movie file")
	
	#Parallel processing(Caller)
	r = Parallel(n_jobs=16)( [delayed(main_image_process)(f, tool) for f in frames] )

	#sort
	sorted_out = sorted(r, key=lambda x: x.id)

	#Video writing
	try :
		for item in sorted_out:
			writer.write(item.content)

	except cv2.error as e:
		print(e)


	writer.release()
	
	print("write movie file")
	print("Done!!! {}[sec]".format(round(time.time() - start,4)))

Other points

nametuples

--Frame order

I wanted to process them all at once, so I decided to create a simple object and tried using named tuples.

MovieFrame = namedtuple("MovieFrame", ["id", "content", "timestamp"])

With this, you can set src.id and src.content as well as the property (getter), so you can shorten the code.

sort

The frames are aggregated after parallel processing, but the order may be slightly out of order. Therefore, the display order (id) is sorted.

sorted_out = sorted(r, key=lambda x: x.id)

result

What is the processing time? .. .. 267.7924 seconds (about 4.5 minutes) It was shortened to. : tada:

It's a big difference compared to the first 810 seconds.

processing time[sec] processing time[min]
Before correction 810 13.5
Method 1(PIL <-> OpenCV) 450 7.5
Method 1(PIL <-> OpenCV) +Method 2(Parallel processing) 268 4.46

in conclusion

Since error messages are hard to be displayed, it is difficult to start parallel processing from the beginning unless you reduce the number of processes to one and eliminate bugs. In the process of adding characters in Japanese, if you do not duplicate the image data with canvas = Image.fromarray (rgbImg) .copy (), you can perform sequential processing, but an error occurs in parallel processing. Parallel processing is not possible without preparing a little environment, so processing is fast, but it is inevitable that it will be troublesome. : tired_face:

Referenced link

Recommended Posts

I tried to speed up video creation by parallel processing
I tried to summarize the general flow up to service creation by self-education.
Mongodb Shortest Introduction (3) I tried to speed up even millions
I tried to get an image by scraping
I tried to classify dragon ball by adaline
I tried to debug.
I tried to paste
I tried to compare the processing speed with dplyr of R and pandas of Python
What I did to speed up the string search task
I tried to learn PredNet
I tried to implement anomaly detection by sparse structure learning
I tried to organize SVM.
[Introduction to simulation] I tried playing by simulating corona infection ♬
I tried to implement PCANet
[Django] I tried to implement access control by class inheritance.
Ubuntu blew up when I tried to change my username
[Introduction to Pandas] I tried to increase exchange data by data interpolation ♬
I tried to classify MNIST by GNN (with PyTorch geometric)
I tried to reintroduce Linux
I tried to introduce Pylint
I tried 100 language processing knock 2020
I tried to summarize SparseMatrix
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to create a Power BI report by processing CSV / JSON logs using Azure Databricks
I tried to create a simple credit score by logistic regression.
[Introduction to simulation] I tried playing by simulating corona infection ♬ Part 2
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 10 to 14]
I tried to visualize the Beverage Preference Dataset by tensor decomposition.
Speed up processing by using C ++ vector, unordered_map internally in Cython
I tried to display the video playback time (OpenCV: Python version)
I tried to implement sentence classification by Self Attention with PyTorch
I tried to summarize the commands used by beginner engineers today
I tried to predict by letting RNN learn the sine wave
I tried to visualize Boeing of violin performance by pose estimation
I tried to cut out a still image from the video
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 15 to 19]
I tried to solve the shift scheduling problem by various methods
Stack processing speed comparison by language
I tried to implement Deep VQE
I tried to create Quip API
Numba to speed up as Python
I tried to touch Python (installation)
I tried 100 language processing knock 2020: Chapter 3
I tried to implement adversarial validation
I tried to explain Pytorch dataset
I tried Watson Speech to Text
I tried to touch Tesla's API
Project Euler 4 Attempt to speed up
I tried to implement hierarchical clustering
How to speed up Python calculations
I tried to organize about MCMC.
I tried 100 language processing knock 2020: Chapter 1
I tried to implement Realness GAN
[DRF] Snippet to speed up PrimaryKeyRelatedField
I tried to move the ball
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
I tried asynchronous processing using asyncio
I tried to estimate the interval.
I tried to get the batting results of Hachinai using image processing