[kotlin] Create a real-time image recognition app on android

What to do this time

Create an application that recognizes the image (image) captured by the camera on Android in real time. Run the trained model on android using PyTorch Mobile.

This ↓

The sample app I made is listed at the bottom, so please take a look if you like.

Dependencies

First, add dependencies (as of February 2020) camera x and pytorch mobile

build.gradle


  def camerax_version = '1.0.0-alpha06'
    implementation "androidx.camera:camera-core:${camerax_version}"
    implementation "androidx.camera:camera-camera2:${camerax_version}"
    implementation 'org.pytorch:pytorch_android:1.4.0'
    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0'

Add the following to the end of the upper ** android {} **

build.gradle


    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

Camera X implementation

After adding the dependency, we will implement the function to take a picture using ** Camera X **, a library that makes it easy to handle the camera on Android.

Below, we will implement the official Camera X Tutorial. Details are mentioned in other articles, so omit it and just the code.

Manifest

Permission permission

<uses-permission android:name="android.permission.CAMERA" />

Layout

Place a button to start the camera and textureView for preview display キャプチccaャ.PNG

activity_main.xml


<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <TextureView
        android:id="@+id/view_finder"
        android:layout_width="0dp"
        android:layout_height="0dp"
        android:layout_marginBottom="16dp"
        app:layout_constraintBottom_toTopOf="@+id/activateCameraBtn"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

    <androidx.constraintlayout.widget.ConstraintLayout
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:alpha="0.7"
        android:animateLayoutChanges="true"
        android:background="@android:color/white"
        app:layout_constraintEnd_toEndOf="@+id/view_finder"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="@+id/view_finder">

        <TextView
            android:id="@+id/inferredCategoryText"
            android:layout_width="0dp"
            android:layout_height="wrap_content"
            android:layout_marginStart="8dp"
            android:layout_marginTop="16dp"
            android:layout_marginEnd="8dp"
            android:text="Inference result"
            android:textSize="18sp"
            app:layout_constraintEnd_toEndOf="parent"
            app:layout_constraintStart_toStartOf="parent"
            app:layout_constraintTop_toTopOf="parent" />

        <TextView
            android:id="@+id/inferredScoreText"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_marginStart="24dp"
            android:layout_marginTop="16dp"
            android:text="Score"
            android:textSize="18sp"
            app:layout_constraintStart_toStartOf="parent"
            app:layout_constraintTop_toBottomOf="@+id/inferredCategoryText" />
    </androidx.constraintlayout.widget.ConstraintLayout>

    <Button
        android:id="@+id/activateCameraBtn"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginBottom="16dp"
        android:text="Camera activation"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent" />

</androidx.constraintlayout.widget.ConstraintLayout>

use case Camera X offers three use cases: ** preview, image capture, and image analysis **. This time we will use preview and image analysis. It will be easier to sort out the code by matching it with the use case. By the way, the possible combinations are as follows. (From official documentation)

cc capture.PNG

Implemented preview use case

We will implement up to the preview of the use case of Camera X. Almost the same content as Tutorial.

MainActivity.kt



private const val REQUEST_CODE_PERMISSIONS = 10
private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)

class MainActivity : AppCompatActivity(), LifecycleOwner {
    private val executor = Executors.newSingleThreadExecutor()
    private lateinit var viewFinder: TextureView

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        viewFinder = findViewById(R.id.view_finder)

        //Camera activation
        activateCameraBtn.setOnClickListener {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                ActivityCompat.requestPermissions(
                    this, REQUIRED_PERMISSIONS, REQUEST_CODE_PERMISSIONS
                )
            }
        }

        viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
            updateTransform()
        }
    }

    private fun startCamera() {

        //Implementation of preview useCase
        val previewConfig = PreviewConfig.Builder().apply {
            setTargetResolution(Size(viewFinder.width, viewFinder.height))
        }.build()

        val preview = Preview(previewConfig)

        preview.setOnPreviewOutputUpdateListener {
            val parent = viewFinder.parent as ViewGroup
            parent.removeView(viewFinder)
            parent.addView(viewFinder, 0)
            viewFinder.surfaceTexture = it.surfaceTexture
            updateTransform()
        }

        /**We will implement the image analysis useCase here later.**/ 

        CameraX.bindToLifecycle(this, preview)
    }

    private fun updateTransform() {
        val matrix = Matrix()
        val centerX = viewFinder.width / 2f
        val centerY = viewFinder.height / 2f

        val rotationDegrees = when (viewFinder.display.rotation) {
            Surface.ROTATION_0 -> 0
            Surface.ROTATION_90 -> 90
            Surface.ROTATION_180 -> 180
            Surface.ROTATION_270 -> 270
            else -> return
        }
        matrix.postRotate(-rotationDegrees.toFloat(), centerX, centerY)

        //Reflected in textureView
        viewFinder.setTransform(matrix)
    }

    override fun onRequestPermissionsResult(
        requestCode: Int, permissions: Array<String>, grantResults: IntArray
    ) {
        if (requestCode == REQUEST_CODE_PERMISSIONS) {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                Toast.makeText(
                    this,
                    "Permissions not granted by the user.",
                    Toast.LENGTH_SHORT
                ).show()
                finish()
            }
        }
    }

    private fun allPermissionsGranted() = REQUIRED_PERMISSIONS.all {
        ContextCompat.checkSelfPermission(
            baseContext, it
        ) == PackageManager.PERMISSION_GRANTED
    }
}

Preparation of model and classification class

This time we will use the trained resnet18.

import torch
import torchvision

model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet.pt")

If it can be executed successfully, a file called resnet.pt will be generated in the same hierarchy. Image recognition is performed using this trained resnet18.

Put the downloaded model in the ** asset folder ** of android studio. (Since it does not exist by default, you can create it by right-clicking on the res folder-> New-> Folder-> Asset folder)

Write the ImageNet class in a file to infer and convert it to a class name. Create a new ** ImageNetClasses.kt ** and write 1000 classes of ImageNet in it. It's too long, so copy it from github.

ImageNetClasses.kt


class ImageNetClasses {
    var IMAGENET_CLASSES = arrayOf(
        "tench, Tinca tinca",
        "goldfish, Carassius auratus",
         //Abbreviation(Please copy from github)
        "ear, spike, capitulum",
        "toilet tissue, toilet paper, bathroom tissue"
    )
}

Creating an image analysis use case

Next, we will implement image analysis of the use case of Camera X. Create a new file called ImageAnalyze.kt and perform image recognition processing.

In the flow, it feels like loading the model and converting the preview image to a tensor so that it can be used with pytorch mobile with image analysis use case, passing it through the model loaded from the asset folder earlier, and getting the result.

After that, I wrote an interface and a custom listener to reflect the inference result in the view. (I don't know how to write correctly around here, so please let me know if there is a smart way to write it.)

ImageAnalyze.kt



class ImageAnalyze(context: Context) : ImageAnalysis.Analyzer {

    private lateinit var listener: OnAnalyzeListener    //Custom listener for updating View
    private var lastAnalyzedTimestamp = 0L
    //Network model model loading
    private val resnet = Module.load(getAssetFilePath(context, "resnet.pt"))

    interface OnAnalyzeListener {
        fun getAnalyzeResult(inferredCategory: String, score: Float)
    }

    override fun analyze(image: ImageProxy, rotationDegrees: Int) {
        val currentTimestamp = System.currentTimeMillis()

        if (currentTimestamp - lastAnalyzedTimestamp >= 0.5) {  // 0.Infer every 5 seconds
            lastAnalyzedTimestamp = currentTimestamp

            //Convert to tensor(I checked the image format and found YUV_420_It was called 888)
            val inputTensor = TensorImageUtils.imageYUV420CenterCropToFloat32Tensor(
                image.image,
                rotationDegrees,
                224,
                224,
                TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
                TensorImageUtils.TORCHVISION_NORM_STD_RGB
            )
            //Infer with a trained model
            val outputTensor = resnet.forward(IValue.from(inputTensor)).toTensor()
            val scores = outputTensor.dataAsFloatArray

            var maxScore = 0F
            var maxScoreIdx = 0
            for (i in scores.indices) { //Get the index with the highest score
                if (scores[i] > maxScore) {
                    maxScore = scores[i]
                    maxScoreIdx = i
                }
            }

            //Get the category name from the score
            val inferredCategory = ImageNetClasses().IMAGENET_CLASSES[maxScoreIdx]
            listener.getAnalyzeResult(inferredCategory, maxScore)  //Update View
        }
    }

    ////Function to get the path from the asset file
    private fun getAssetFilePath(context: Context, assetName: String): String {
        val file = File(context.filesDir, assetName)
        if (file.exists() && file.length() > 0) {
            return file.absolutePath
        }
        context.assets.open(assetName).use { inputStream ->
            FileOutputStream(file).use { outputStream ->
                val buffer = ByteArray(4 * 1024)
                var read: Int
                while (inputStream.read(buffer).also { read = it } != -1) {
                    outputStream.write(buffer, 0, read)
                }
                outputStream.flush()
            }
            return file.absolutePath
        }
    }

    fun setOnAnalyzeListener(listener: OnAnalyzeListener){
        this.listener = listener
    }
}

I was confused because the image was an unfamiliar type called ImageProxy, but when I checked the format, I thought that I had to convert it to bitmap with YUV_420_888, but pytorch mobile has a method to convert from YUV_420 to tensor, and it can be easily inferred just by throwing it in. It was.

By the way, if you look at the code, you may have thought that it is real-time, but every 0.5 seconds ..

Incorporate image analysis use case

Introduced the ImageAnalyze class created earlier to Camera X as a use case, and finally implemented the interface of the ImageAnalyze class in MainActivity using an anonymous object, and completed it so that the view can be updated.

Add the following code to the end of onCreate. (At the top, I commented "/ ** I will implement the image analysis useCase here ** /" later)

MainActivity.kt



        //Implementation of image analysis useCase
        val analyzerConfig = ImageAnalysisConfig.Builder().apply {
            setImageReaderMode(
                ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE
            )
        }.build()

        //instance
        val imageAnalyzer = ImageAnalyze(applicationContext)
        //Display inference results
        imageAnalyzer.setOnAnalyzeListener(object : ImageAnalyze.OnAnalyzeListener {
            override fun getAnalyzeResult(inferredCategory: String, score: Float) {
                //Change the view from other than the main thread
                viewFinder.post {
                    inferredCategoryText.text = "Inference result: $inferredCategory"
                    inferredScoreText.text = "Score: $score"
                }
            }
        })
        val analyzerUseCase = ImageAnalysis(analyzerConfig).apply {
            setAnalyzer(executor, imageAnalyzer)
        }

        //useCase is preview and image analysis
        CameraX.bindToLifecycle(this, preview, analyzerUseCase)  //Added image analysis to use case

Complete! !! If you can implement it so far, the application at the beginning should be completed. Please play around with it.

end

This code is listed on github, so please refer to it as appropriate.

Camera X Really convenient! You can easily perform image analysis in combination with pytroch mobile. It can't be helped that the processing makes it heavier. If you can prepare a model, you can easily make various image recognition applications using a camera. After all, I wonder if it is quick to make an application using that model such as transfer learning.

I want to make and release a machine learning application ... ~~ We plan to make a sample app in the near future. (Currently under review) ~~

I made a sample app

I added it because it passed the examination. I tried to incorporate the content written in this article into the app. It is published on the Play Store.

If you want to experience it quickly, or if you are willing to download it, we would appreciate it if you could download it.

Object analyzer

Play Store: Object Analyzer English and Japanese support   

To be honest, there is a big difference between what can be judged and what cannot be judged ...

Recommended Posts

[kotlin] Create a real-time image recognition app on android
[kotlin] Create an app that recognizes photos taken with a camera on android
Create a classroom on Jupyterhub
Real-time image recognition on mobile devices with TensorFlow learning model
Create an image recognition application that discriminates the numbers written on the screen on android (PyTorch Mobile) [Android implementation]
Create a Python environment on Mac (2017/4)
Implement a Django app on Hy
Create a Linux environment on Windows 10
Create a python environment on centos
Run headless-chrome on a Debian-based image
Create a Docker container image with JRE8 / JDK8 on Amazon Linux
[Python] Create a linebot to write a name and age on an image
Create a dummy image with Python + PIL.
Create a python environment on your Mac
Create a simple GUI app in Python
Create a GUI app with Python's Tkinter
Create a simple web app with flask
Create a Python-GUI app in Docker (PySimpleGUI)
Create a Linux virtual machine on Windows
Image recognition
A addictive story when using tensorflow on Android
How to code a drone using image recognition
Create a beauty pageant support app using PyLearn2
Create an image composition app with Flask + Pillow
[Venv] Create a python virtual environment on Ubuntu
Try to create a new command on linux
How to create a multi-platform app with kivy
Create a Mac app using py2app and Python3! !!
Until you create a new app in Django
[Ubuntu] Install Android Studio and create a shortcut
Create a Python execution environment on IBM i
Create a Python virtual development environment on Windows
Create a GUI on the terminal using curses
Web App Development Practice: Create a Shift Creation Page with Django! (Experiment on admin page)