Try to make a capture software with as high accuracy as possible with python (1) https://qiita.com/akaiteto/items/b2119260d732bb189c87

Roughly speaking, the goal Make software like Amarekoko with python, Highly customizable recording Let's record, The purpose is.

Last time, how to capture an image with python, We examined the basic usage of each part of the system sound capture method.

This time, we will consider improving the accuracy of screen capture.

Image capture

Introduction

Last time, I heard that the screen capture process is slow. The overall processing speed is as slow as 18fps. If this is a video, it seems to be a little stiff. I personally want more than 28.

As a whole process

1.Image capture
2.Convert image color format to RGB

When I think about these two steps In the previous study, we improved one step.

Specifically, when the processing is measured at a resolution of 1920 x 1080, Previously (ImageGrab.grab) was 26fps, With the introduction of win32, it became fps42.

Next, we will consider improving the conversion process in 2. Before that, I think about whether it is necessary to pursue speed in the first place.

If you don't need real-time Keep the data as an array or a jpg file, I think there is no problem even if you perform the conversion process later.

However, here I dare to aim for real-time performance. Considering recording for a long time, if you keep holding the data without thinking It's likely to put pressure on memory, and there may be benefits.

So, let's delve into the processing speed of the conversion process.

Comparison of screen capture / conversion processing

ImageGrab.grab of Pillow used last time outputs an image in BGR format. When saving a video, it needs to be an RGB image, so OpenCV conversion was absolutely necessary, which slowed down the processing speed.

And this time. The image output by win32 api is RGBA. Conversion to RGB is required to save as a video.

Last time I tried to speed up the capture, Now let's compare the speed of capture and conversion.

#Traditional ImageGrab.grab
parentTime = time.time()
for i in range(40):
    img_cv = np.asarray(ImageGrab.grab())
    img = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
current = time.time()
diff = (current - parentTime)
print("fps:" + str(float(40)/diff))

#Improvement win32+opencv
parentTime = time.time()
for i in range(40):
    memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)
    img = np.fromstring(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
current = time.time()
diff = (current - parentTime)
print("fps:" + str(float(40)/diff))

fps:17.665207097327016
fps:29.761997556556736

Oh, win32 is fast enough even in the conversion process of opencv. Let's take a look at this once.

Video output

We haven't made much progress, but let's write the code so far. (It's totally unorganized & problematic code so don't use it)

import cv2
import numpy as np
from PIL import ImageGrab
import ctypes
import time
import pyaudio
import wave
import win32gui, win32ui, win32con, win32api

import warnings
warnings.simplefilter("ignore", DeprecationWarning)

hnd = win32gui.GetDesktopWindow()
width = 1920
height = 1080
windc = win32gui.GetWindowDC(hnd)
srcdc = win32ui.CreateDCFromHandle(windc)
memdc = srcdc.CreateCompatibleDC()
bmp = win32ui.CreateBitmap()
bmp.CreateCompatibleBitmap(srcdc, width, height)
memdc.SelectObject(bmp)

user32 = ctypes.windll.user32
capSize = (user32.GetSystemMetrics(0), user32.GetSystemMetrics(1))
print(capSize)

fourcc = cv2.VideoWriter_fourcc(*"DIVX")
writer = cv2.VideoWriter("test.mov", fourcc, 20, capSize)
count = 0
FirstFlag = True

WAVE_OUTPUT_FILENAME = "test.wav"
RECORD_SECONDS = 5

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 2 ** 11
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    input_device_index=0,
                    frames_per_buffer=CHUNK)

frames = []

sTime = time.time()
count = 0

arrScreenShot = []

print ("start")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    count+=1
    if count == 30 :
        current = time.time()
        diff = (current - sTime)
        print("fps:" +  str(float(count)/diff))
        sTime = time.time()
        count = 0

    #Image capture
    # fps18
    # img_cv = np.asarray(ImageGrab.grab())
    # img = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
    # writer.write(img)

    # fps29
    memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)
    img = np.fromstring(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)

    #Video export
    writer.write(img)

    # #Audio capture
    # data = stream.read(CHUNK)
    # frames.append(data)

writer.release()
stream.stop_stream()
stream.close()
audio.terminate()

waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
print ("finish")

The audio capture process is commented out. If you do screen capture and audio capture at the same time, it will be slow, so I plan to rewrite it in parallel processing. (The one at the bottom is rewritten)

For now, let's complete the screen capture.

fps:20.089757121871102

When I run the above source and measure the speed ... it's slow. The cause is in the process of exporting the video.

    writer.write(img)

The existence of this process performed in the loop lowers the fps by about 10. This function is an opencv function, a function of the class cv2.VideoWriter. If there is another fast library for exporting videos, I would like to use that.

Examination of delay / fast forward measures

The problems with using this function are as follows, as far as the specifications I know.

1. 1. Slow when writing.
2. You can only save videos at a fixed frame rate.

If there is a video export library different from opencv, it is a problem that you want to verify that as well.

To supplement the second one, When I captured the screen last time, the fps was 14-19fps The result was quite variable. Because of that, the output video time was off.

You can write in a fixed frame, ignoring some fps deviations, but Considering the final combination of video and audio, If you can, you'll want to save it exactly.

With standard ios api, when outputting a video Pass the time stamp with the frame image I feel like I saw the video saved. It would be easiest if you could save the video at a variable frame rate (VFR) like that. In addition, it is even better if the writing speed is fast.

Let's find out if there is another such library.

・・・・・・・・・・

... I looked it up, but I couldn't find it. There seems to be no choice but to make a compromise.

1. 1. Slow when writing.

If possible, I wanted to write it immediately after capturing it, but let's give up. Try to keep the captured image in an array and process it after shooting.

If you do not output the retained data in some form, If you record for a long time, the memory seems to puncture in a blink of an eye. But for now, let's focus on fulfilling the features.

2. You can only save videos at a fixed frame rate.

Before the second measure, we will sort out the current problems. As a trial, let's graph the processing time per image when each image is captured.

... (omitted) ...

arrScreenShot = []
imgList = []
graph = []
print ("start")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    count+=1
    current = time.time()
    diff = (current - sTime)
    graph.append(diff)
    # print( str(i) + "Frame:" + str(diff))
    sTime = time.time()
    count = 0

    #Image capture
    memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)
    img = np.fromstring(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
    imgList.append(img)


import statistics
import matplotlib.pyplot as plt
median = statistics.median(graph)
print("Median:" + str(median))
x = list(range(len(graph)))
x3 = [median] * len(graph)
plt.plot(x, graph,  color = "red")
plt.plot(x, x3,  color = "blue")
plt.show()

... (omitted) ...

無題.png

The horizontal axis is the number of captured frame images. (107 sheets in total) The vertical axis is the processing time per frame (s). For example, if it is 30fps, the processing time per sheet will be 0.03s.

The red line is the plot line of the measured processing time, The blue line is the median of the red line. The value is 0.03341841697692871. In terms of fps, it is 29.9.

Looking at the graph, the processing time of the rising part You can see that it is extremely slow. This seems to be the biggest cause of fast forward and slow forward.

I thought about the following as a countermeasure.

1. 1. Write video at variable frame rate
2. Corrected to take screenshots at regular time intervals
Write a video in a fixed frame
3. 3. Ignore frame images that exceed or lack the specified time interval and use the previous frame.
Write a video in a fixed frame.
4. Do not record the first part that is too slow. Targets 20 frames and later.

1 searched for a library that can be written with VFR, but gave up because it could not be found. 4 seems to be the easiest, but with this, there will be a gap in the playback time between video and audio. You can see how much it is off, but It's a bit unpleasant when you think about merging sound and video in the end.

Somehow, 2 seems to make the most sense. Let's go with the policy of 2.

Examination of delay / fast forward measures: Capture images at regular time intervals

Currently, the screen capture process is executed every time you loop with for. It is not supposed to be captured at regular time intervals.

https://qiita.com/montblanc18/items/05715730d99d450fd0d3 So, referring to this site, Try to output at regular time intervals. For the time being, I will try to execute it as it is without thinking about anything.

~ (Omitted) ~

import time
import threading

def worker():
    print(time.time())
    memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), win32con.SRCCOPY)
    img = np.fromstring(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
    imgList.append(img)

def scheduler(interval, f, wait = True):
    base_time = time.time()
    next_time = 0
    while True:
        t = threading.Thread(target = f)
        t.start()
        if wait:
            t.join()
        next_time = ((base_time - time.time()) % interval) or interval
        time.sleep(next_time)

scheduler(0.035, worker, False)
exit()

~ (Omitted) ~

As a result, many frames were output as videos safely, but they often failed.

The cause seems to be that one object was referenced by multiple threads. So far, one instance called memdc that controls screen capture I've been using it all the time.

By using thread processing, one instance can be referenced. It's messed up. Let's rewrite.

~ (Omitted) ~
frames = []

sTime = time.time()
count = 0

arrScreenShot = []
imgList = []
graph = []
print ("start")

import time
import threading

def worker(imgList):
    print(time.time())
    imgList.append(win32con.SRCCOPY)

def scheduler(interval,MAX_SECOND, f, wait = False):
    base_time = time.time()
    next_time = 0
    while (time.time()-base_time) < MAX_SECOND:
        t = threading.Thread(target = f,args=(imgList,))
        t.start()
        if wait:
            t.join()
        next_time = ((base_time - time.time()) % interval) or interval
        time.sleep(next_time)

scheduler(1/fps, 40, worker, False)

for tmpSRCCOPY in imgList:
    memdc.BitBlt((0, 0), (width, height), srcdc, (0, 0), tmpSRCCOPY)
    img = np.fromstring(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
    writer.write(img)

~ (Omitted) ~

Wow fucking saw s ... set it aside Last time I tried to output a 40 second video, but it came back in 28 seconds. What will happen this time?

無題.png

39 seconds. Oh, it's working. There is no problem because it is a problem of the decimal point that it is less than 1 second. It's a success. Screen capture is good once like this Let's verify the accuracy more finely in the latter half.

Well, with the implementation so far, The points of concern and the problems left behind are as follows.

● Because I just put the image data in the array
If you record for a long time, it will drop.
● I introduced a high-speed win32 to record in real time.
After all, our assumption of recording in real time has disappeared.
Actually ImageGrab.grab()It became a mechanism that is not much different from using.

Summary

Organize the sources. Execute after adding the following library.

pip install moviepy

This library is used to merge video and audio.

And the following is the source for recording and recording at the same time. As a measure against delays, I decided to execute audio and screen captures in separate threads .... (Slightly suspicious)

`capture.py`


import cv2
import numpy as np
import pyaudio
import wave
import win32gui, win32ui, win32con, win32api
import time
import threading

#Error countermeasures when trying to convert to numpy with win32 api
import warnings
warnings.simplefilter("ignore", DeprecationWarning)

class VideoCap:
    FrameList=[]

    def __init__(self,width,height,fps,FileName):
        capSize = (width, height)

        fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
        self.writer = cv2.VideoWriter(FileName, fourcc, fps, capSize)

        hnd = win32gui.GetDesktopWindow()
        windc = win32gui.GetWindowDC(hnd)
        self.srcdc = win32ui.CreateDCFromHandle(windc)


    def RecordStart(self,fps,rec_time):
        def StoreFrameCap(FrameList):
            # print(time.time())
            FrameList.append(win32con.SRCCOPY)

        def scheduler(interval, MAX_SECOND, f, wait=False):
            base_time = time.time()
            next_time = 0
            while (time.time() - base_time) < MAX_SECOND:
                t = threading.Thread(target=f, args=(self.FrameList,))
                t.start()
                if wait:
                    t.join()
                next_time = ((base_time - time.time()) % interval) or interval
                time.sleep(next_time)

        scheduler(1 / fps,rec_time, StoreFrameCap, False)

    def RecordFinish(self):
        for tmpSRCCOPY in self.FrameList:
            memdc = self.srcdc.CreateCompatibleDC()
            bmp = win32ui.CreateBitmap()
            bmp.CreateCompatibleBitmap(self.srcdc, width, height)
            memdc.SelectObject(bmp)
            memdc.BitBlt((0, 0), (width, height), self.srcdc, (0, 0), tmpSRCCOPY)

            img = np.fromstring(bmp.GetBitmapBits(True), np.uint8).reshape(height, width, 4)
            img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
            self.writer.write(img)

        memdc.DeleteDC()
        win32gui.DeleteObject(bmp.GetHandle())

        self.writer.release()

class AudioCap:

    class default:
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 44100
        CHUNK = 2 ** 11

    frames=[]
    audio = pyaudio.PyAudio()

    def __init__(self,FORMAT=default.FORMAT,CHANNELS=default.CHANNELS,RATE=default.RATE,CHUNK=default.CHUNK):
        self.FORMAT = FORMAT
        self.CHANNELS = CHANNELS
        self.RATE = RATE
        self.CHUNK = CHUNK

    def RecordStart(self,rec_time):
        self.stream = self.audio.open(format=self.FORMAT,
                        channels=self.CHANNELS,
                        rate=self.RATE,
                        input=True,
                        input_device_index=0,
                        frames_per_buffer=self.CHUNK)
        for i in range(0, int(self.RATE / self.CHUNK * rec_time)):
            data = self.stream.read(self.CHUNK)
            self.frames.append(data)

    def RecordFinish(self):
        self.stream.stop_stream()
        self.stream.close()
        self.audio.terminate()

    def writeWAV(self,FileName):
        waveFile = wave.open(FileName, 'wb')
        waveFile.setnchannels(self.CHANNELS)
        waveFile.setsampwidth(self.audio.get_sample_size(self.FORMAT))
        waveFile.setframerate(self.RATE)
        waveFile.writeframes(b''.join(self.frames))
        waveFile.close()

#basic configuration
width = 1920        #Resolution horizontal
height = 1080       #Resolution vertical
fps = 30              #　FPS
RECORD_SECONDS = 60  #Playback time
VIDEO_OUTPUT_FILENAME = "test.mp4"       #Audio file
AUDIO_OUTPUT_FILENAME = "test.wav"       #Video file
FINAL_VIDEO = "final_video.mp4"       #Video + audio file


#instance
CapAuidio = AudioCap()
CapVideo = VideoCap(width,height,fps,VIDEO_OUTPUT_FILENAME)

#For voice processing threads
def threadFuncAudio(obj):
    obj.RecordStart(RECORD_SECONDS)
    obj.RecordFinish()
    obj.writeWAV(AUDIO_OUTPUT_FILENAME)

thrAudio = threading.Thread(target=threadFuncAudio(CapAuidio,))

#Simultaneous capture start
thrAudio.start()
CapVideo.RecordStart(fps,RECORD_SECONDS)
CapVideo.RecordFinish()


#Verification: How much is the difference in playback time?
from pydub import AudioSegment
sound = AudioSegment.from_file(AUDIO_OUTPUT_FILENAME, "wav")
time = sound.duration_seconds #Playback time(Seconds)
print('Audio: Playback time:', time)

cap = cv2.VideoCapture(VIDEO_OUTPUT_FILENAME)
print('Video: Playback time:',cap.get(cv2.CAP_PROP_FRAME_COUNT) / cap.get(cv2.CAP_PROP_FPS))

#Verification: Video / audio merging
from moviepy.editor import VideoFileClip
from moviepy.editor import AudioFileClip

my_clip = VideoFileClip(VIDEO_OUTPUT_FILENAME)
audio_background = AudioFileClip(AUDIO_OUTPUT_FILENAME)
final_clip = my_clip.set_audio(audio_background)
final_clip.write_videofile(FINAL_VIDEO, fps=fps)

The results are as follows.

Audio: Playback time: 4.96907029478458
Video: Playback time: 4.933333333333334
Moviepy - Building video final_video.mp4.
MoviePy - Writing audio in final_videoTEMP_MPY_wvf_snd.mp3
MoviePy - Done.
Moviepy - Writing video final_video.mp4

Moviepy - Done !
Moviepy - video ready final_video.mp4

Process finished with exit code 0

There is a 0.03 second gap in the 5-second video. If the recording time is long, will the deviation become large?

Audio: Playback time: 59.953922902494334
Video: Playing time: 59.06666666666667

A delay of about 1 second in 60 seconds ... When you export a video with a fixed frame Delays are likely to occur no matter what.

From the next time onwards ● How much difference is there between video and audio ● What will happen if you record for a long time? Consider.

Continue to next time

Try to make a capture software with as high accuracy as possible with python (2)