Last time implemented a multi-layer perceptron using "Chainer" and tried to recognize CAPTCHA images. .. This time, I will try the same thing using Google's image analysis API "Cloud Vision API".

agenda

Implementation code
Try it
Summary
References

0. Implementation code

This time, I created a simple image analysis class for verification. As you can see, it just POSTs a JSON-formatted request defined by the Cloud Vision API.

`MyRecognitionImage.py`


#!/usr/bin/python
#coding:utf-8
import base64
import json
from requests import Request, Session


#Analyze images with Cloud Vision API
class RecognizeImage():

    def __init__(self):
        return

    #CAPTCHA analysis
    def recognize_captcha(self, str_image_path):
        #Loading CAPTCHA images
        bin_captcha = open(str_image_path, 'rb').read()

        #Encode CAPTCHA images with base64
        str_encode_file = base64.b64encode(bin_captcha)

        #Specify API URL
        str_url = "https://vision.googleapis.com/v1/images:annotate?key="

        #API key obtained in advance
        str_api_key = "XXXXXXXXX"

        # Content-Set Type to JSON
        str_headers = {'Content-Type': 'application/json'}

        #Define the JSON payload according to the Cloud Vision API specifications.
        #To extract the text from the CAPTCHA image, the type is "TEXT"_Set to "DETECTION".
        str_json_data = {
            'requests': [
                {
                    'image': {
                        'content': str_encode_file
                    },
                    'features': [
                        {
                            'type': "TEXT_DETECTION",
                            'maxResults': 10
                        }
                    ]
                }
            ]
        }

        #Send request
        obj_session = Session()
        obj_request = Request("POST",
                              str_url + str_api_key,
                              data=json.dumps(str_json_data),
                              headers=str_headers
                              )
        obj_prepped = obj_session.prepare_request(obj_request)
        obj_response = obj_session.send(obj_prepped,
                                        verify=True,
                                        timeout=60
                                        )

        #Acquisition of analysis results
        if obj_response.status_code == 200:
            print obj_response.text
            return obj_response.text
        else:
            return "error"

The following three points should be noted when using the API.

--Be sure to base64-encode the image to be analyzed. --Obtain an API key in advance to use the API. --Specify an appropriate "type" according to the purpose of analysis.

When the above code is executed, the following request is POSTed.

POST /v1/images:annotate?key=XXXXXXXXX HTTP/1.1
User-Agent: python-requests/2.8.1
Host: vision.googleapis.com
Accept: */*
Content-Type: application/json
Content-Length: 939

{
 "requests":[
  {
   "image":{
    "content": "iVBORw0KGgoAAAANSUhEUgA ・ ・ ・(abridgement)・ ・ ・/EV4ihonpXVAAAAAElFTkSuQmCC"
   },
   "features":[
    {
     "type":"TEXT_DETECTION",
     "maxResults":10
    }
   ]
  }
 ]
}

Specify Base64-encoded image data in "content" and specify the analysis content you want to execute in "type". Since we want to recognize CAPTCHA this time, specify the text extraction "TEXT_DETECTION". In addition to text extraction, the following analysis can be performed.

--Understanding what is reflected in the image --Detect inappropriate content --Analyzing the meaning of the image

For example, if you POST an image of Tokyo Station, you can recognize it as "Tokyo Station", and if you POST an image that you are happy with, you can recognize it as "happy". I would like to try these in the future.

Image analysis requires quite a lot of machine power, so it was a high hurdle to work on as a hobby. However, anyone can easily perform image analysis using this API. What a wonderful API!

1. Try it

Let's use this to recognize CAPTCHAs.

First of all, from now on.

The extracted text is output to "description".

`1st analysis result`


{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "O l 4.67 9\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 6,
                "y": 1
              },
              {
                "x": 165,
                "y": 1
              },
              {
                "x": 165,
                "y": 35
              },
              {
                "x": 6,
                "y": 35
              }
            ]
          }
        }
      ]
    }
  ]
}

The result is "O l 4.67 9". 0 (zero) is an uppercase letter O "O", 1 (ichi) is a lowercase letter "l", and there are strange dots, but it can be seen that they are generally recognized correctly. It can be said that the correct answer rate is 100%.

Next is the second one.

This has been disappointing in past validations, but what about the Cloud Vision API?

`Second analysis result`


{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "496'0,\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 6,
                "y": 10
              },
              {
                "x": 148,
                "y": 10
              },
              {
                "x": 148,
                "y": 70
              },
              {
                "x": 6,
                "y": 70
              }
            ]
          }
        }
      ]
    }
  ]
}

The output order of the text is slightly changed, but "4", "0", "9", and "6" can be recognized. The recognition rate was 50% last time, so it can be seen that it has improved.

This is the last.

`Third analysis result`


{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "425970\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 5,
                "y": 7
              },
              {
                "x": 97,
                "y": 7
              },
              {
                "x": 97,
                "y": 33
              },
              {
                "x": 5,
                "y": 33
              }
            ]
          }
        }
      ]
    }
  ]
}

well done!! You can see that everything can be recognized accurately.

It is possible to extract text with fairly high accuracy, probably because it is learning based on the huge amount of image data held by Google. This could be used for CAPTCHA recognition.

By the way, in the new CAPTCHA "reCAPTCHA" developed by Google, the same animal / thing is selected from multiple images as shown below. It seems that it distinguishes between humans and bots.

Source: Gigazine (partially omitted)

In this example, it is necessary to select all the same images as the top image (cat), so the correct answer is to select the first from the top left and the second and third from the bottom left.

By the way, I confirmed that the Cloud Vision API can be used to accurately distinguish between cats and dogs.

It is possible to distinguish not only "cat" and "dog" but also the type (american shorthair, german shepherd dog, etc.) almost accurately. If you are interested, I recommend you to try it.
Type should be "LABEL_DETECTION".

2. Summary

I tried to recognize CAPTCHA using Cloud Vision API. There are some improvements, but the results are good enough to be used for CAPTCHA recognition.

We also found that it is possible to break through not only simple CAPTCHAs (numerical images, etc.) but also advanced ones such as reCAPTCHAs. This API will be charged after the trial period expires, but considering writing code by yourself and preparing a machine with high specifications, we think that the cost performance is high.

In the future, after repeated verification, I would like to use it for the CAPTCHA recognition engine of the automatic crawler of the Web application.

3. References

Google Cloud Vision API

that's all

Machine Learning x Web App Diagnosis: Recognize CAPTCHA with Cloud Vision API

agenda

0. Implementation code

MyRecognitionImage.py