Last time introduced how to collect candidate images using the Bing Image Search API. This time, I will introduce how to analyze and collect videos by frame analysis.
The video is basically the same mechanism as a flip book, and the movement is expressed by switching the still image within a short time. At this time, the still image that composes the video is the frame image, and the number of frame images per unit time that composes the video is It is called the frame rate and is expressed in units of fps (how many frame images are used per second).
In other words, if you extract the frame image of the scene in which the face is reflected from the video, you can secure some candidate images even from the video with the short scene. In addition, many videos such as DVDs are taken with the subject's face looking at the camera, and many of them have appropriate corrections such as the amount of light. One of the merits is that there is a high possibility that a face image suitable for learning can be extracted from the candidate images extracted from the video.
This time, I would like to extract a frame image that is a candidate image from the video using OpenCV that I used last time.
This time, I will capture the video using the free tool QuickTime Player.
You may find it inconvenient to capture the video, but the .mov format video captured and output by QuickTime Player has the advantage that the file can be read smoothly with OpenCV. (If you already have a video that can be read by OpenCV, skip this step.)
Also, when extracting all frame images from a long-time video such as a DVD, the number is enormous and the face may not be properly reflected in all the frame images. Therefore, by partially capturing the video scene in which the face is reflected from the video, the aim is to efficiently extract the frame image in which the face is reflected. Furthermore, if this method is used, candidate images can be extracted from moving images published on the web.
※※※ Caution ※※※ ** The video captured this time is intended for use in machine learning. ** ** ** Please refrain from redistributing the captured video. ** ** ** Also, when capturing videos posted on the web, ** ** Please do not touch the rules of each video site that publishes the video! ** **
As for the capture method, detailed explanation is posted in "Recording the screen" of Official site. The explanation is omitted here. If you need to download and install QuickTime Player, please go to here. There is no audio in the captured video, but this time there is no problem because no audio is required.
After generating the captured video, extract the frame image that is a candidate image from it. The following is a code example that extracts a frame image and saves it as a candidate image.
# -*- coding: utf-8 -*-
import cv2
def movie_to_image(num_cut):
video_path = '/hoge/hoge.mov' #Captured video path (including file name)
output_path = '/hoge/save/' #Folder path to output
#Captured video read (capture structure generation)
capture = cv2.VideoCapture(video_path)
img_count = 0 #Number of saved candidate images
frame_count = 0 #Number of frame images read
#Loop as long as there is a frame image
while(capture.isOpened()):
#Get one frame image
ret, frame = capture.read()
if ret == False:
break
#Saves the specified number of frame images by thinning them out
if frame_count % num_cut == 0:
img_file_name = output_path + str(img_count) + ".jpg "
cv2.imwrite(img_file_name, frame)
img_count += 1
frame_count += 1
#Capture structure open
capture.release()
if __name__ == '__main__':
#Frame image extraction with the thinning number set to 10
movie_to_image(int(10))
The argument given to the movie_to_image
method is the number to thin out the frame image.
As explained above, in the case of a long-time video, saving all the frame images as candidate images will result in a huge number of images.
Also, if the motion in the video is extremely slow, multiple face frame images with almost the same composition will be generated.
In some cases, you may dislike it.
By thinning out the frame images to be saved there to some extent, those problems are solved.
Therefore, in the case of a video with a frame rate of 29.97fps,
If you set the thinning number to 30
, you can get about one frame image at 1 second intervals in the video.
In the above code, a candidate image was generated from the captured video file, but with a slight change in this code You can also generate candidate images in real time while shooting a video with a webcam.
For example, to use the in-camera that comes standard with the MacBook Air, All you have to do is change the above code as follows.
#capture = cv2.VideoCapture(video_path)
#Give the camera device number
capture = cv2.VideoCapture(0)
You can also use a USB-connected webcam, but depending on the camera, the device may not be recognized and you may not be able to use it.
0
, and the USB-connected camera (made by Buffalo) is 1
or later.
I confirmed that the camera device number was assigned.Also, if there is no machine spec to some extent, real-time processing while shooting may not be able to catch up. The problem tends to be more pronounced, especially when saving the entire number of frames. In that case, please solve it by adjusting the thinning number. The above is simple, but it was an explanation of an example of collecting candidate images by frame analysis of a video.
Recommended Posts