How to collect face images relatively easily

background

When I was collecting face images for machine learning, I thought, "Isn't this trick alone a single article?"

Collect facial images of specific people

One of the things that you tend to do in the beginning of machine learning is to let you know if you are a specific person. To do this, we want to collect a lot of images of that particular person as teacher data. It seems that many of the sites that are searched and collected are crawlered, but you can also collect face images by following the steps below. As an example, suppose you want to collect a lot of images of Manatsu Akimoto.

  1. Open a web browser and maximize
  2. Search for "** Akimoto Manatsu **" in google image search
  3. Capture the screen and save it as an image
  4. Scroll to show the image that was not displayed
  5. Repeat steps 3 and 4 until no more search results are available.
  6. Cut out only the face part from the image saved in 3 using a script (described later) written a little using OpenCV.

Collect facial images of any person

One of the things that you tend to do in the beginning of opportunity learning is to let you judge whether you are a specific person. You may want to collect a lot of facial images other than the correct answer as incorrect images. It seems that many of the sites that are searched and collected are crawlered, but you can also collect face images by following the steps below.

  1. Open a web browser and maximize
  2. Search for "** group photo **" with google image search
  3. Manually download images with many people until you feel like it.
  4. Cut out only the face part from the image saved in 3 using a script (described later) written a little using OpenCV.

A script that cuts out only the face part from the image

The path written in the variable cascade_path will probably differ depending on the environment, so search for it. You need to have OpenCV and python installed in advance.

import cv2
import glob
import sys
import os
import imghdr
import datetime
import time

def main(srcdir, destdir, cascade_path='/home/pi/opencv-3.1.0/data/haarcascades/haarcascade_frontalface_alt.xml'):

  winname = 'searching..'
  cv2.namedWindow(winname, cv2.WINDOW_AUTOSIZE)

  if not os.path.exists(destdir):
    os.mkdir(destdir)

  lastsaved = datetime.datetime.now()
  prefix = lastsaved.strftime('%Y%m%d-%H%M%S_')
  counter = 0
  cascade = cv2.CascadeClassifier(cascade_path)

  for filename in glob.glob(srcdir + "/*"):

    if os.path.isdir(filename):
      continue
    if imghdr.what(filename) == None:
      continue

    print("load " + filename)
    img = cv2.imread(filename)
    frect = cascade.detectMultiScale(img, minSize=(64, 64))
    pos = []
    if len(frect) > 0:
      for r in frect:
        x, y, w, h = r[0], r[1], r[2], r[3]
        face = img[y:y+h, x:x+w]
        if len(face) != 0:
          if w > 0 and h > 0:
            filename = destdir + "/" + prefix + str(counter) + ".jpg "
            cv2.imwrite(filename, face)
            print("save " + filename)
            counter += 1
            pos.append(r)
    for p in pos:
      cv2.rectangle(img, (p[0],p[1]),(p[0]+p[2],p[1]+p[3]),(0,0,255), 8)
    if len(pos) > 0:
      cv2.imshow(winname, img)
      cv2.waitKey(1)

  cv2.destroyWindow(winname)

if __name__ == '__main__':
  main(sys.argv[1], sys.argv[2])

If you name this img2face.py, for example

python ./img2face.py ./imgs ./face

It works like this. If you put the image file collected earlier under ./imgs and then execute it, the image with the face part cut out will be output under ./face.

in conclusion

Advance preparation is troublesome for machine learning. I would like everyone to publish more and more methods to make it easier.

Recommended Posts

How to collect face images relatively easily
How to collect images in Python
How to put OpenCV in Raspberry Pi and easily collect images of face detection results with Python
[Python] Collect images easily with icrawler!
How to collect machine learning data
How to view images in Django's Admin
How to draw OpenCV images in Pygame
How to delete log with Docker, not to collect log
How to display images continuously with matplotlib Note
Learn how to inflate images from TensorFlow code
How to display multiple images of galaxies in tiles
How to use xml.etree.ElementTree
How to use virtualenv
Scraping 2 How to scrape
How to use Seaboan
How to use image-match
How to use shogun
How to install Python
How to use Pandas 2
How to read PyPI
How to install pip
How to use Virtualenv
How to use numpy.vectorize
How to update easy_install
How to install archlinux
How to use pytest_report_header
How to restart gunicorn
How to install python
How to virtual host
How to debug selenium
How to use partial
How to read JSON
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to update Spyder
How to use IPython
How to install BayesOpt
How to use virtualenv
How to use Matplotlib
How to use iptables
How to use numpy
Collect images using icrawler
How to use TokyoTechFes2015
How to use venv
How to use dictionary {}
How to use Pyenv
How to grow dotfiles
How to use list []
How to use python-kabusapi
"How to count Fukashigi"
How to install Nbextensions
How to use OptParse
How to use return
How to install Prover9
How to use dotenv
How to operate NumPy
How to use pyenv-virtualenv
How to use Go.mod
How to use imutils
How to use import