[kotlin] Create an app that recognizes photos taken with a camera on android

What to do this time

Take a picture with android, save it, display the picture and classify the image, and make a simple image recognition application that displays the classification result.

This ↓ Start the camera, shoot, Display the picture you took on the screen Recognize the pictures you take

Libraries and keywords used this time

-Python PyTorch Mobile ・ Android Camera X ・ Resnet18 ・ Kotlin

Only the ones that came out last year ...

Dependencies

First, add dependencies (as of February 2020) camera x and pytorch mobile

build.gradle


  def camerax_version = '1.0.0-alpha06'
    implementation "androidx.camera:camera-core:${camerax_version}"
    implementation "androidx.camera:camera-camera2:${camerax_version}"
    implementation 'org.pytorch:pytorch_android:1.4.0'
    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0'

Add the following to the end of the upper ** android {} **

build.gradle


    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

Camera X implementation

After adding the dependency, we will implement the function to take a picture using ** Camera X **, a library that makes it easy to handle the camera on Android.

Below, we will implement the official Camera X Tutorial. Details are mentioned in other articles, so omit it and just the code.

Manifest

Permission permission

<uses-permission android:name="android.permission.CAMERA" />

Implemented the function to take a picture with a camera

Add a function to take a picture with a camera and save it. Follow the tutorial to preview the camera and capture the camera. Since it is almost the same as the content of the tutorial, I will put only the code.

Layout

Place the place to display the taken picture, the place to display the preview of the camera, the camera start button, the capture button, and the inference button appropriately.

activity_main.xml


<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <Button
        android:id="@+id/capture_button"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="2dp"
        android:text="photograph"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.25"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/frameLayout" />

    <Button
        android:id="@+id/activateCamera"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Camera activation"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.25"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/capture_button" />

    <ImageView
        android:id="@+id/capturedImg"
        android:layout_width="500px"
        android:layout_height="500px"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent"
        app:srcCompat="@mipmap/ic_launcher_round" />

    <FrameLayout
        android:id="@+id/frameLayout"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="8dp"
        android:background="@android:color/holo_blue_bright"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/capturedImg">

        <TextureView
            android:id="@+id/view_finder"
            android:layout_width="500px"
            android:layout_height="500px" />
    </FrameLayout>

    <Button
        android:id="@+id/inferBtn"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginStart="32dp"
        android:text="inference"
        app:layout_constraintBottom_toBottomOf="@+id/capture_button"
        app:layout_constraintStart_toEndOf="@+id/capture_button"
        app:layout_constraintTop_toTopOf="@+id/capture_button" />

    <TextView
        android:id="@+id/resultText"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="4dp"
        android:text="Inference result"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.31"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/activateCamera" />

</androidx.constraintlayout.widget.ConstraintLayout>

MainActivity

MainActivity.kt



private const val REQUEST_CODE_PERMISSIONS = 10
private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)

class MainActivity : AppCompatActivity(), LifecycleOwner {

    private var imgData: Bitmap? = null   //Saved image data storage variable

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        viewFinder = findViewById(R.id.view_finder)

        //Camera activation
        activateCamera.setOnClickListener {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                ActivityCompat.requestPermissions(
                    this, REQUIRED_PERMISSIONS,REQUEST_CODE_PERMISSIONS
                )
            }
        }

        viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
            updateTransform()
        }
        /**Add code to classify images later here**/
    }

    private val executor = Executors.newSingleThreadExecutor()
    private lateinit var viewFinder: TextureView

    private fun startCamera() {
        //Create preview use case
        val previewConfig = PreviewConfig.Builder().apply {
            setTargetResolution(Size(viewFinder.width, viewFinder.height)) // 680, 480
        }.build()

        val preview = Preview(previewConfig)

        preview.setOnPreviewOutputUpdateListener {
            val parent = viewFinder.parent as ViewGroup
            parent.removeView(viewFinder)
            parent.addView(viewFinder, 0)

            viewFinder.surfaceTexture = it.surfaceTexture
            updateTransform()
        }

        //Capture use case creation
        val imageCaptureConfig = ImageCaptureConfig.Builder()
            .apply {
                setCaptureMode(ImageCapture.CaptureMode.MIN_LATENCY)
            }.build()

        val imageCapture = ImageCapture(imageCaptureConfig)

        //photography
        capture_button.setOnClickListener {
            val file = File(
                externalMediaDirs.first(),
                "${System.currentTimeMillis()}.jpg "
            )

            imageCapture.takePicture(file, executor,
                object : ImageCapture.OnImageSavedListener {
                    override fun onError(
                        imageCaptureError: ImageCapture.ImageCaptureError,
                        message: String,
                        exc: Throwable?
                    ) {
                        val msg = "Photo capture failed: $message"
                        Log.e("CameraXApp", msg, exc)
                        viewFinder.post {
                            Toast.makeText(baseContext, msg, Toast.LENGTH_SHORT).show()
                        }
                    }

                    override fun onImageSaved(file: File) {
                        //Get the saved file data as a bitmap
                        // ()Rotate 90 degrees using Matrix to display
                        val inputStream = FileInputStream(file)
                        val bitmap = BitmapFactory.decodeStream(inputStream)
                        val bitmapWidth = bitmap.width
                        val bitmapHeight = bitmap.height
                        val matrix = Matrix()
                        matrix.setRotate(90F, bitmapWidth / 2F, bitmapHeight / 2F)
                        val rotatedBitmap = Bitmap.createBitmap(
                            bitmap,
                            0,
                            0,
                            bitmapWidth,
                            bitmapHeight,
                            matrix,
                            true
                        )

                        imgData = rotatedBitmap  //Store images for inference
                        //View photos taken
                        //Change view from other than main thread
                        viewFinder.post {
                            capturedImg.setImageBitmap(rotatedBitmap)
                        }
                        val msg = "Photo capture succeeded: ${file.absolutePath}"
                        viewFinder.post {
                            Toast.makeText(baseContext, msg, Toast.LENGTH_SHORT).show()
                        }

                    }
                })
        }
        //Preview and capture use case
        CameraX.bindToLifecycle(this, preview, imageCapture)
    }

    private fun updateTransform() {
        val matrix = Matrix()

        val centerX = viewFinder.width / 2f
        val centerY = viewFinder.height / 2f

        val rotationDegrees = when (viewFinder.display.rotation) {
            Surface.ROTATION_0 -> 0
            Surface.ROTATION_90 -> 90
            Surface.ROTATION_180 -> 180
            Surface.ROTATION_270 -> 270
            else -> return
        }
        matrix.postRotate(-rotationDegrees.toFloat(), centerX, centerY)

        viewFinder.setTransform(matrix)
    }
    
    override fun onRequestPermissionsResult(
        requestCode: Int, permissions: Array<String>, grantResults: IntArray
    ) {
        if (requestCode == REQUEST_CODE_PERMISSIONS) {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                Toast.makeText(
                    this,
                    "Permissions not granted by the user.",
                    Toast.LENGTH_SHORT
                ).show()
                finish()
            }
        }
    }
    
    private fun allPermissionsGranted() = REQUIRED_PERMISSIONS.all {
        ContextCompat.checkSelfPermission(
            baseContext, it
        ) == PackageManager.PERMISSION_GRANTED
    }
}

If you can do this, you should be able to take a picture and display the picture on the screen. (I don't know if it's because of my environment or the code is bad, but there is a considerable lag between taking a picture and displaying the picture taken.)

The official provides three use cases for Camera X: ** preview, capture, and image analysis **, but this time we will use a combination of preview and capture. By the way, the supported combinations of use cases are as follows. (Official Document)

cc capture.PNG

Implementation of image recognition

Download model

This time, we infer using a trained model.

import torch
import torchvision

model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet.pt")

If it can be executed successfully, a file called resnet.pt will be generated in the same hierarchy. (Later put this in the android studio folder) Image recognition is performed using this trained resnet18.

Infer using a model

asset folder

First, throw the model you downloaded earlier into the android studio folder. The place to throw in is ** asset folder ** (Since it does not exist by default, you can create it with right-click res folder-> new-> folder-> Asset folder)

Next, create a function to get the path from the asset folder Add the following to the bottom of MainActivity.kt

MainActivity.kt


 //Get path of asset file
    private fun getAssetFilePath(context: Context, assetName: String): String {
        val file = File(context.filesDir, assetName)
        if (file.exists() && file.length() > 0) {
            return file.absolutePath
        }
        context.assets.open(assetName).use { inputStream ->
            FileOutputStream(file).use { outputStream ->
                val buffer = ByteArray(4 * 1024)
                var read: Int
                while (inputStream.read(buffer).also { read = it } != -1) {
                    outputStream.write(buffer, 0, read)
                }
                outputStream.flush()
            }
            return file.absolutePath
        }
    }

inference

Make it possible to refer to 1000 classes of Image Net so that you can get the class to classify images. Create a new ImageNetCategory.kt and write the class name there. (It's too long, so copy it from github)

ImageNetCategory.kt


class ImageNetCategory {
    var IMAGENET_CLASSES = arrayOf(
        "tench, Tinca tinca",
        "goldfish, Carassius auratus",

          //Abbreviation(Please copy from github)

        "ear, spike, capitulum",
        "toilet tissue, toilet paper, bathroom tissue"
    )
}

Then implement the main inference part. Add the following to the last part of onCreate in MainActivity.kt.

MainActivity.kt



  //Loading network model
        val resnet = Module.load(getAssetFilePath(this, "resnet.pt"))

        /**inference**/
        inferBtn.setOnClickListener {
            //Resize the photo you took to 224 x 224
            val imgDataResized = Bitmap.createScaledBitmap(imgData!!, 224, 224, true)
            //Convert bitmap to tensor
            val inputTensor = TensorImageUtils.bitmapToFloat32Tensor(
                imgDataResized,
                TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
                TensorImageUtils.TORCHVISION_NORM_STD_RGB
            )

            //Forward propagation
            val outputTensor = resnet.forward(IValue.from(inputTensor)).toTensor()
            val scores = outputTensor.dataAsFloatArray

            var maxScore = 0F
            var maxScoreIdx = 0
            for (i in scores.indices) {
                if (scores[i] > maxScore) {
                    maxScore = scores[i]
                    maxScoreIdx = i
                }
            }
            
            //Convert inference result index to category name
            val inferCategory = ImageNetCategory().IMAGENET_CLASSES[maxScoreIdx]
            resultText.text = "Inference result:${inferCategory}" 
        }

Image recognition can be performed only with this. Please take various pictures and exchange models to play with.

end

This code is listed on github, so please refer to it as appropriate. Actually, I tried to put VGG-16 or something, but I gave up because I thought it would be troublesome because it was out of memory. It would be interesting to put a model that has undergone various transfer learning. Also, I thought it would be convenient to use the camera functions easily with Camera X.

Recommended Posts

[kotlin] Create an app that recognizes photos taken with a camera on android
[kotlin] Create a real-time image recognition app on android
Create an app that guesses students with python
Create an app that guesses students with python-GUI version
Upload images taken with an action camera with a Mapillay script
Let's create an app that authenticates with OIDC with Azure AD
Create a Todo app with Django ① Build an environment with Docker
Create a web app that can be easily visualized with Plotly Dash
Create a GUI app with Python's Tkinter
Create a simple web app with flask
Create an English word app with python
Create an application that recognizes images by writing numbers on the screen on android (PyTorch Mobile) [CNN network creation]
Create an app that works well with people's reports using the COTOHA API
Create an image composition app with Flask + Pillow
Create a page that loads infinitely with python
How to create a multi-platform app with kivy
Create an exe file that works in a Windows environment without Python with PyInstaller
Web App Development Practice: Create a Shift Creation Page with Django! (Experiment on admin page)
Put an index on a column that causes sushi beer problems with Django1.7 + MySQL
Create a Todo app with Django REST Framework + Angular
Create a native GUI app with Py2app and Tkinter
Create a Todo app with the Django REST framework
Create a chatbot that supports free input with Word2Vec
Create a Todo app with Django ③ Create a task list page
Create a random number with an arbitrary probability density
Create an OpenAI Gym environment with bash on Windows 10
Deploy a Django app made with PTVS on Azure
Create a Todo app with Django ⑤ Create a task editing function
Create a shape on the trajectory of an object
Create a bot that only returns the result of morphological analysis with MeCab on Discord
A story that I had a hard time trying to create an "app that converts images like paintings" with the first web application
Create a list in Python with all followers on twitter
Create a web surveillance camera with Raspberry Pi and OpenCV
[Python] Create a linebot that draws any date on a photo
Let's create a script that registers with Ideone.com in Python.
Create a life game that is manually updated with tkinter
(Failure) Deploy a web app made with Flask on heroku
Creating an environment that automatically builds with Github Actions (Android)
Tornado-Let's create a Web API that easily returns JSON with JSON
Create a web API that can deliver images with Django
Real-time object detection Android app with TensorFlow and Camera X
Create an image recognition application that discriminates the numbers written on the screen on android (PyTorch Mobile) [Android implementation]
Create a graph that displays an image with a mouse hover using the data visualization library Dash