Introduction

OS:windows10 64bit python:3.7

There are tons of software out there that can record, Recording the screen of a specific website at a specified time Because there was no software that could meet the super personal demand I will write it in python as if it were a playful brain training.

It's not as difficult as writing articles separately, Due to time constraints, I will write in different parts.

In addition, various problems that occurred while working, It's very verbose because I'm writing about how to deal with it. Please note.

policy

As a whole, let's assume the following flow.

1. Command execution at the specified time
Start the browser with the preset URL
1. Capture browser screenshots and audio
Video output by merging browser screenshots and audio

This page considers 3. It feels like a quick check of python's famous libraries I couldn't find anything that can capture both image and audio at the same time. (Maybe if you look for git. I'd appreciate it if you could tell me.)

I think it's easy to do everything with ffmpeg, This time I will try it under the meaningless binding of executing it on python code.

Audio capture

I will try using the famous pyaudio.

Install pyaudio

Depending on the person, my hand may stop, so I will write it as a memo. When installing pyaudio, the following error occurred in my environment.

src/_portaudiomodule.c(29): fatal error C1083: include
Unable to open the file.'portaudio.h':No such file or directory

It seems that the build tool is failing. https://minato86.hatenablog.com/entry/2019/04/04/005929#portaudiohNo-such-file-or-directory Refer to the above link and install from the whl file that matches your PC.

pip install PyAudio-0.2.11-cp37-cp37m-win_amd64.whl

My execution environment is pycham. I'm running in a python virtual environment. Even if you execute the above command, it will be on the C drive It will only be installed on the installed python.

D:\～(Omission)～\venv\Scripts\activate.bat

After activating the virtual environment by hitting activate.bat from the command prompt, In that state, I hit the above command to install it.

Implemented with pyaudio

Now that the installation is complete, let's record it.

A program that records with Python 3 and writes it to a wav file
https://ai-trend.jp/programming/python/voice-record/

I don't know what kind of library it is, so I didn't plug in any microphone for the time being Let's run the recording sample source on this site as it is.

OSError: [Errno -9998] Invalid number of channels

It stopped with an error. This error is due to the absence of a recording input device. I'm angry, "What are you trying to record without a microphone?" The input_device_index specified in audio.open seems to be incorrect. Let's check the audio device in your computer

import pyaudio
audio = pyaudio.PyAudio()
for i in range(audio.get_device_count()):
    dev = audio.get_device_info_by_index(i)
    print('name', dev['name'])
    print('index', dev['index'])

0 Microsoft Sound Mapper- Output, MME (0 in, 2 out)
<  1 BenQ GL2460 (NVIDIA High Defini, MME (0 in, 2 out)
   2 Realtek Digital Output (Realtek, MME (0 in, 2 out)
   5 Realtek Digital Output (Realtek High Definition Audio), Windows
DirectSound (0 in, 2 out)
   6 Realtek Digital Output (Realtek High Definition Audio), Windows WASAPI
(0 in, 2 out)
   ...
   ...
   ...
10 stereo mixer(Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
12 microphone(Realtek HD Audio Mic input), Windows WDM-KS (2 in, 0 out)

I've been there a lot. This is not the case because BenQ and Realtek Digital Output are output devices, or speakers. You can find a microphone as an input device, but This is likely to record the outside voice.

Which one is likely to input the sound inside the computer?

10 stereo mixer(Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)

The stereo mixer is a function that gives the sound inside the computer as input sound. Here, let's select "Stereo Mixer" and execute it. (It seems that you can do it with WASAPI, but ignore it here) Enter the stereo mixer number in the device index and try running.

[Errno -9999] Unanticipated host error

Also an error. I feel that it also refers to the settings on the OS side. Let's check the settings on the OS side

https://ahiru8usagi.hatenablog.com/entry/Windows10_Recording Sound Control Panel-> Recording Tab-> Enable Stereo Mixer, Select "Sound"-> "Input" in windows for the stereo mixer.

So I was able to record safely. (The code is in the next chapter)

Screen capture

Let's take a screenshot with ImageGrab.

pip install Pillow

Simultaneous recording and audio capture

Finally the code (prototype). This code has some problems, but I will post it as a progress.

import cv2
import numpy as np
from PIL import ImageGrab
import ctypes
import time
import pyaudio
import wave

#Save start time

# parentTime = time.time()
# for i in range(10):
#     img_cv = np.asarray(ImageGrab.grab())
# current = time.time()
# diff = (current - parentTime)
# print("fps:" + str(float(10)/diff))


user32 = ctypes.windll.user32
capSize = (user32.GetSystemMetrics(0), user32.GetSystemMetrics(1))

fourcc = cv2.VideoWriter_fourcc(*"DIVX")
writer = cv2.VideoWriter("test.mov", fourcc, 30, capSize)
count = 0
FirstFlag = True

WAVE_OUTPUT_FILENAME = "test.wav"
RECORD_SECONDS = 40

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 2 ** 11
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    input_device_index=0,
                    frames_per_buffer=CHUNK)

frames = []

# #Save start time
# sTime = time.time()
# count = 0

print ("start")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    # count+=1
    # if count == 30 :
    #     current = time.time()
    #     diff = (current - sTime)
    #     print("fps:" +  str(float(count)/diff))
    #     sTime = time.time()
    #     count = 0

    #Image capture
    img_cv = np.asarray(ImageGrab.grab())
    img = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
    writer.write(img)

    #Audio capture
    data = stream.read(CHUNK)
    frames.append(data)
print ("finish")


writer.release()
stream.stop_stream()
stream.close()
audio.terminate()

waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

Code to save video (mov) and audio (wav) separately. The recording time is 40 seconds because it is the number of seconds specified in RECORD_SECONDS. Well, when I try to do this, the result is. .. ..

Audio file is 39 seconds
Video file is 28 seconds

Audio files aside The video file gave a very suspicious result.

writer = cv2.VideoWriter("test.mov", fourcc, 30, capSize)

I set the setting when saving the video to 30 as appropriate, but This value seems to be inappropriate. Let's roughly calculate how many fps are.

    if count == 30 :
        current = time.time()
        diff = (current - sTime)
        print("fps:" +  str(float(count)/diff))
        sTime = time.time()
        count = 0

The result is about fps14-19. Calculation that 14 to 19 images are output per second. What you can imagine as the processing done in the contents is, For frame images that come at 0.06 second intervals Because the video was output as if it came at 0.03 second intervals, It seems that the video was fast forwarded and the time was short.

writer = cv2.VideoWriter("test.mov", fourcc, 15, capSize)

Change to fps15 and frames that are too late or too early You may be able to solve the problem of time by skipping and writing.

But as a problem before that ... it's slow. Even if it can be recorded, it seems to be rattling.

What is the cause of the process? Let's try to speed up each process.

#ImageGrab alone
parentTime = time.time()
for i in range(40):
    img = ImageGrab.grab()
current = time.time()
diff = (current - parentTime)
print("fps:" + str(float(40)/diff))

#ImageGrab+numpy
parentTime = time.time()
for i in range(40):
    img_cv = np.asarray(ImageGrab.grab())
current = time.time()
diff = (current - parentTime)
print("fps:" + str(float(40)/diff))

As a result

ImageGrab.grab()27fps on its own,
ImageGrab.grab()20fps when converting from to numpy.
ImageGrab.grab()18fps with conversion from to numpy and RGB conversion.

I'm not saying that ImageGrab.grab () is slow, but it's a bit of a problem when considering post-processing.

You can improve each conversion process, Try to find something faster than ImageGrab.grab () that captures the image.

https://stackoverrun.com/ja/q/3379877

Refer to the above site and try to output with windows api.

# windows_api
import win32gui, win32ui, win32con, win32api
hwin = win32gui.GetDesktopWindow()
width = 1920
height = 1080
left = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)
top = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)
hwindc = win32gui.GetWindowDC(hwin)
srcdc = win32ui.CreateDCFromHandle(hwindc)
memdc = srcdc.CreateCompatibleDC()
bmp = win32ui.CreateBitmap()
bmp.CreateCompatibleBitmap(srcdc, 1920, 1080)
memdc.SelectObject(bmp)

parentTime = time.time()
arr = []
for i in range(30):
    memdc.BitBlt((0, 0), (width, height), srcdc, (left, top),
win32con.SRCCOPY)
    arr.append(memdc)
current = time.time()
diff = (current - parentTime)
print("fps:" + str(float(30)/diff))

fps:48.752326144998015

fps48. It is a detonation velocity. It seems to be interesting, so I will fix it with the policy of using this api.

I've run out of time so next time

Try to make capture software with as high accuracy as possible with python (1)